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Andreas Fickers and Florentina Armaselu 
@ Introduction 


Abstract: This chapter outlines the conceptual framework of the book and the va- 
riety of viewpoints related to the use of the notion of scale and zooming in digital 
history and humanities. The contributions included in the volume encompass dif- 
ferent degrees of theoretical assumptions, practical insights and middle-ground 
reflections, symbolically expressed through the three conceptual levels: bird's-eye 
view, overhead view and ground view. While no general theory of scale is defined, 
the reader is offered the ingredients needed to build such theoretical constructs 
based on his or her own exploratory and symbolic journey through Zoomland. 
This variable-scale representation is combined with four categories of enquiry or 
thematic realms that make up the territory of Zoomland: History, Media, Herme- 
neutics and Digital landscapes. 


Keywords: scale, zooming, digital history, digital humanities, multiple perspectives 


Welcome to Zoomland! 


Imagine you could travel to Zoomland - both physically and virtually. This is what 
this book - in both its printed and online open access version - is offering. To explore 
its multi-layered landscape, you can delve into this opening chapter for a more con- 


Acknowledgement: The authors would like to thank Sarah Cooper, from the Language Centre of 
the University of Luxembourg, for English proofreading. 


Note: This book is the result of a thinkering grant awarded in 2019 by the C?DH to the first editor. The 
initial project included a call for papers, launched in October 2021, and a cross-reviewing workshop, 
which was held at the C2DH and online in June 2022 so that the authors could come together to 
collectively discuss the selected proposals. During the discussion, it was also decided which of the 
three conceptual or symbolic levels or views should be assigned to each chapter, to foster multi- 
scale explorations of Zoomland. The criteria used for this purpose involved aspects such as the de- 
gree and coverage of theoretical considerations or the particular focus on a specific project or topic, 
and the in-between possibilities for middle-ground enquiries. The contributions included in the vol- 
ume represent the outcome of the peer-review and selection process that followed the workshop. 
The symbolic book-land(scape) representation of Zoomland was initially devised by Florentina Arma- 
selu as a 2D visualisation using symbols and colours for the three perspectives and four thematic 
areas to make up a landscape. The three symbols, suggesting a certain object size or position relative 
to the ground, were assigned to the chapters according to the workshop discussion; the representa- 
tion eventually served as a starting point for the game. 


8 Open Access. © 2024 the author(s), published by De Gruyter. LGS This work is licensed under the 
Creative Commons Attribution-NonCommercial 4.0 International License. 
https://doi.org/10.1515/9783111317779-001 


2 —— Andreas Fickers and Florentina Armaselu 


ventional introduction to the topic or, if you are looking for a thrilling adventure, em- 
bark on an experimental online game! set on the uncharted island of ZOOMLAND. 


Figure 1: ZOOMLAND game: physical map. 


Figure 2: ZOOMLAND game: symbolic map; e bird's-eye view, (O) overhead view, ground view; 
and thematic colour code: , Media, Hermeneutics, Digital landscapes. 


1 Zoomland, accessed on July 25, 2023, https://www.c2dh.uni.lu/zoomland. 
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Designed by Kirill Mitsurov and Daniele Guido, the online game, inspired by 
the book-land(scape) metaphor, invites the player/reader to embark on the small 
island of ZOOMLAND and explore the unknown territory (Figure 1) by looking for 
signs and symbols that represent different heuristic perspectives and thematic 
entry points from which the content ofthe book can be discovered and explored. 
In order to do so, the player/reader has to collect the chapter cards assigned to 
the various objects scattered across the island. These objects, according to their 
size or position relative to the ground, can stand for three conceptual stand- 
points - bird’s-eye view, overhead view and ground view -, one of which is attrib- 
uted to each chapter. Through this quest, the player is building a symbolic map of 
the island (Figure 2) that offers another way of looking at the configuration and 
nature ofthe assembled pieces and another means of accessing the actual manu- 
script of Zoomland. 

A voyage into Zoomland feels like an encounter with the fairy-tale figure of 
Tur Tur, the imaginary giant in Michael Ende’s famous children book “Jim Button 
and Luke the Engine Driver” (Jim Knopf und Lukas der Lokomotivführer in the Ger- 
man original) from 1960. Jim Button is a little black boy living on the tiny island of 
Morrowland. When Jim grows bigger, there is simply no longer enough space for 
everybody. Someone must go, decides King Alfred the Quarter-to-Twelfth. But 
should that someone really be Emma, the locomotive of Jim’s best friend Luke? Jim 
cannot allow that. Together with the engine driver and Emma the locomotive, he 
leaves the island and sets off on a great adventure: transparent trees, stripy moun- 
tains and dragons cross their path.” In a desperate episode on the journey, in 
which Jim and Luke traverse a desert atthe end of the world, they see a giant appa- 
rition on the horizon. Although Jim is frightened, they finally decide to wave to the 
figure and tell it to come closer. And to their great surprise, the long-bearded figure 
gets smaller with every step closer he takes. When the friendly old man finally 
stands in front of them, he has shrunk to the size of a normal person and presented 
himself as Tur Tur the imaginary giant. “Good day”, he says. “I really don't know 
how I can thank you enough for not running away from me. For years I have been 
longing to meet someone who got such courage as you, but no one has ever allowed 
me to come near them. I only look so terribly big from a long way off" (Ende 1963: 
123-124). 

After the initial shock has worn off, Jim and Luke ask Tur Tur to explain the 
nature of his existence as a make-believe giant. 


2 Michael Ende, accessed on July 29, 2023, https://michaelende.de/en/book/jim-button-and-luke-en 
gine-driver. 
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Figure 3: Jim Button and Luke the engine driver meet Tur Tur, the make-believe giant. In: Michael 
Ende,? Jim Button and Luke the Engine Driver, Woodstock (N.Y.): The Overlook Press, 1963, p. 120-121. 


“Well,” said Tur Tur, “there isn’t really an awful lot to tell. You see, my friends, if one of you 
were to get up now and go away, you would grow smaller and smaller till in the end you 
only look like a dot. And if you started to come back, you would grow bigger and bigger till 
you would stand in front of us your proper size. You must agree, though, that in reality you 
would have remained the same size all the time. It only looks as if you’ve grown smaller 
and smaller, and then bigger and bigger.” 

“Exactly,” said Luke. 

“Well, now,” explained Tur Tur, “in my case it is simply the other way about. That's all. 
The farther away I am away, the bigger I seem. And the closer I come, the better you can 
see my real size. My nature is simply the opposite to yours." 

"You mean to say," asked Luke, *you don't really grow smaller when you come nearer? 
And you aren't really the size of a giant when you are far away, but it only seems like it?" 

"That's right,” replied Mr. Tur Tur. "That's why I said I am a make-believe giant. Just as 
you might call the other people make-believe dwarfs, because they look in the distance like 
dwarfs, though they aren't dwarfs, really." 


3 Reproduced with permission (O Michael Ende, illustrator F.J. Tripp, Jim Knopf und Lukas der 
Lokomotivführer, Thienemann-Esslinger Verlag GmbH). 
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“This is very interesting,” mumbled Luke, and he thoughtfully blew a few smoke rings. 
(Ende 1963: 127) 


In the conclusion of the treatise “Aufstieg und Fall der Zentralperspektive” (Rise 
and Fall of the Central Perspective, 2004), the literary scholar Dieter Borchmeyer 
explains: “The ‘illusory giant’ thus turns perspective foreshortening on its head. 
[. ..] In the inverted world of the illusory giant, perspective appearance becomes a 
plaything of the imagination - a signal that central perspective, once the proud 
achievement of a thoroughly rationalised, scientifically dominated world, has 
played out its historical role forever" (Borchmeyer 2004: 310). To us, the central or 
linear perspective, an artistic technique developed in Renaissance paintings, has 
become a by-word for modernity and rationality, in line with the concept of linear 
time and geometrically measured space. 


1 Zooming as a metaphor of knowledge 
production and heuristic practice 


The “birth” of the so-called early modern period in the 16th century was character- 
ised by a fruitful exchange and interplay between new techniques, instruments and 
experimental practices in both the sciences and the arts and crafts (Rossi 1997). 
Magic and science were intertwined (Daston and Park 1998), but with the use of in- 
struments such as the telescope and the microscope, the nature of scientific observa- 
tion changed dramatically — and with it notions of truth, objectivity and reality 
(Stengers 1993). As “leading instruments" of the early modern process of natural en- 
quiry, both the telescope and the microscope remained rooted in the ancient ideal of 
visual perception, with the eye being the most trustful and objective sense — seeing is 
believing (Weigl 1990: 16-17). The ability to get closer to matter, either by zooming in 
(microscope) or zooming out (telescope), radically changed the perspective of the ex- 
ploratory mind. It triggered philosophical — today we would say epistemological — de- 
bates about the nature of cognition as well as the intricate relationship between our 
senses and technical instruments in the co-construction of reality. As the etymologi- 
cal roots of the Greek term okortéu (skopéo) — “to look, examine, or inspect” or figu- 
ratively “to contemplate, consider" — suggest,’ instruments such as the telescope, the 
microscope and later the stethoscope in medicine or the periscope in nautical naviga- 
tion are technologies of knowledge that fostered a whole set of new heuristic practi- 
ces. These practices had to be appropriated through critical learning and testing, and 


4 Accessed on July 23, 2023, https://en.wiktionary.org/wiki/oxonéw. 
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it was only once a degree of mastery in handling such instruments had been ac- 
quired that they could become epistemic objects in the co-construction of new knowl- 
edge (Rheinberger 2008). 

As a cultural practice, zooming gained popularity with the development of vari- 
focal lenses in photography in the second half of the 19th century (Kingslake 1989). 
Preceding this, playing with multiple lenses to create the impression of movement 
had been a widespread artistic practice employed by so-called lanternists, who pro- 
duced shows with the help of magic lanterns, theatrical performances and sound 
effects (Jolly and DeCourcy 2020). As Etienne-Gaspard Robert, an experienced pro- 
jectionist and inventor of the *Phantasmagoria" - a show combining the projection 
of glass slides and shadow play - explained, the zoom-like special effects created 
suspense and surprise amongst the spectators: “At a great distance a point of light 
appeared; a figure could be made out, at first very small, but then approaching with 
slow steps, and at each step seeming to grow; soon, now of immense size, the phan- 
tom advanced under the eyes of the spectator and, at the moment when the latter 
let out a cry, disappeared with unimaginable suddenness” (Heard 2006: 97). This 
scene of a magic lantern show from the 1890s reads like a Victorian anticipation of 
the Tur Tur scene by Michael Ende - albeit with a different purpose. While the lan- 
ternist made use of a visual trick to enchant his audience, Ende aimed to make his 
young (and no doubt also his adult) readers think about the tension between experi- 
ence-based visual perceptions and popular imagination and narrative conventions. 
As Nick Hall has outlined in his study about the aesthetic and narrative functions of 
zooming in film and television, it is the “impression of being transported forward or 
backward at high speed” that makes zooming such a powerful technique in drama- 
tized filmic narratives (Hall 2018: 1). 

Technically, the zoom lens is a lens with continuously variable focal length. 
The variation of the focal length creates an apparent movement forward or back- 
ward - but this movement is different from a physically moving camera. As Hall 
explains: “When a camera moves, the position of objects in the images it captures 
change relative to one another. From these movements, viewers can infer the 
size, shape, and depth of the space in front of the camera. The zoom, by contrast, 
simply magnifies the image in front of the camera, and no change occurs to the 
relative position of the objects in front of it" (Hall 2018: 8). As such, the zoom lens 
basically magnifies or reduces the size of a picture,’ but depending on the length 


5 The change of scale in the reproduction of an image was also an issue in printing technology. 
With the “fougéadoire”, an invention in lithography patented by the Frenchman Auguste Fougea- 
doire in 1886, it was possible to reduce or enlarge the size of an illustration based on rubber by 
stretching or compressing the rubber plate. See https://fr.wikipedia.org/wiki/Fougéadoire, 
accessed July 17, 2023. 
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or speed of the zoom shot, its aesthetic effect and dramatic force can vary greatly. 
The psychological properties of the zoom lens are therefore inextricable from the 
phenomenology of cinematic or televisual vision (Sobchack 1990). With the “zoom 
craze" in amateur film of the 1950s and 1960s, zooming conquered the non- 
professional realm of home movie or family film (Hall 2018: 123-152), preparing a 
larger audience for the digital revolution to come. Today, zooming in and out on 
the screens of our mobile phones or desktop computers has become “one of our 
predominant twenty-first century ‘ways of seeing”, argues Tom White. “We zoom 
in and out on manuscript images in much the same way we zoom in and out on 


text, maps, websites, and our personal photos".? 


2 Zooming in/out and the art of scalable reading 


Scholars in digital history and humanities are increasingly interested in the meta- 
phor of zooming and scale as a way to imagine and revisit concepts, methods and 
tools for processing, analysing and interpreting data, and for creating innovative 
representations of knowledge. New terms and expressions have been added to 
the digital humanities vocabulary, which convey a particular conceptual frame 
based on the potential of digital methods and tools to support new forms of data 
exploration and various types of functionality. The aim may be to balance out the 
global, universal standpoint of *big data" with a *small data" world view, along 
with *every point in between", as in the case of the macroscope: *What is more, a 
‘macroscope’ - a tool for looking at the very big — deliberately suggests a scien- 
tists workbench, where the investigator moves between different tools for ex- 
ploring different scales, keeping notes in a lab notebook" (Graham et al. 2016: 
XVD. To enable the combination of distant reading (Moretti 2013) with close read- 
ing, a new cultural technique of scalable reading is necessary: through changes of 
perspective by zooming in and out, from a bird's-eye view to close up, new forms 
of “intertextual analysis" (Mueller 2014) are opened up. The aim of scalable read- 
ing is to move “gracefully between micro and macro without losing sight of either 
one", to see *patterns" and *outliers", to zoom in and zoom out and to understand 
what *makes a work distinctive" within a *very large context" (Flanders and Jock- 
ers 2013). In a similar vein, the use of geographic information systems (GIS) and 
Web-based spatial technologies to build “spatial narratives" that capture multiple 
voices, views and memories to be seen and examined at *various scales" (Boden- 


6 Tom White, *A Working History of Digital Zoom, Medieval to Modern," in Humanities Com- 
mons (2023), https://doi.org/10.17613/2tfp-de60. 
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hamer et al. 2015: 5) is promoted as a new form of historical and cultural exegesis 
of space and time (deep mapping) within what is called the spatial humanities. 

Some disciplines in the humanities, such as history, have previously inte- 
grated the concept of scale into their discourse, thus providing starting points for 
theoretical and methodological enquiry within a digitally oriented context. Exam- 
ples from this area include conceptual constructs such as the division of historical 
temporalities into long-, middle- and short-term history, referring to “quasi- 
immobile”, “slowly paced” and “rapid” processes and events taking place at the 
environmental, social and individual levels (Braudel 1976: 11-12). Other reflec- 
tions target the interconnections between the notion of scale in history and its 
counterparts in cartography, architecture and optics, with reference to the degree 
of detail or available information at a certain level of organisation, the construc- 
tion of a historiographic object or the operational metaphors of “magnifying 
glass”, “microscope” and “telescope”, as applied to historical discourse (Ricoeur 
2000: 268-270). Scale in history also serves to define the “historical universe”, a 
continuum in which one pole is occupied by “syntheses of extreme generality — 
universal histories" while the opposite pole is ascribed to *investigations of atom- 
like events" (Kracauer 2014: 104). Combined approaches are also possible, with 
studies in *global microhistory" that integrate micro and macro analysis, some- 
times supported by digitised libraries, archival collections and websites dedicated 
to family genealogies, to connect microhistories of individuals with broader 
scenes and contexts on a global scale (Trivellato 2011). 

Recent theoretical approaches have pointed out the importance of scale, under- 
stood as both an ontological and epistemological entity, and the need to consider 
broader standpoints and a wider disciplinary spectrum when analysing it. Horton's 
(2021) viewpoint on the cosmic zoom provides such an example - that is, a concep- 
tual framework for thinking about scale and its medial manifestations, in an at- 
tempt to capture totality as a world view, from the microscopic to the cosmic 
stance. Starting from Boeke's book Cosmic View (1957), and its legacy, Powers of 
Ten, a 1977 film by Ray and Charles Eames, and Morrison's 1979 review of Boeke's 
book, Horton sets out to examine the cosmic zoom through the lens of media analy- 
sis. He considers both analogue and digital media, spanning various representa- 
tional and compositional scalar standpoints, from literary and cinematographic to 
database-driven forms of mediation. Similarly, by arguing that scale represents a 
significant concept in all sciences, as well as in culture, language and society, DiCa- 
glio (2021) devises six thought experiments that serve as a basis for elaborating a 
general theory of scale, intended to apply to a wide number of disciplines. In the 
first of these experiments, he defines resolution as a key element of scale: *Resolu- 
tion is the amount of detail one can discern within an observation. [. . .] At differ- 
ent resolutions, different objects are discerned [. . .] Shifts in resolution and shifts 
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in scale go hand in hand: scale tracks the range of observation while resolution 
points to the amount of detail able to be seen at that range” (DiCaglio 2021: 23). This 
assumption allows him to distinguish between different types of scaling: he con- 
trasts Gulliver’s scaling, in the sense of “making objects bigger or smaller”, carto- 
graphic scaling, as a matter of “representation” and “transformation of reality”, 
and cinematic zooming, as a “result of magnification” or “moving forward”, with 
scaling that involves a change in resolution, a “transformation of observation”, as 
illustrated by the scalar transformations operated in the Powers of Ten - from the 
view oftwo picnickers in a park to planets and galaxies and back down to cells and 
subatomic particles (DiCaglio 2021: 26-27, 70, 232). The main goal of the book, 
grounded in rhetoric, philosophy, science studies and critical theory, is therefore to 
train the reader in scalar thinking, in other words in understanding how scale re- 
configures reality and the main conceptual, perceptual and discursive aspects of 
scale involved in this reconfiguration process. 


3 Playing with layers and perspectives: Bird’s-eye 
view, overhead view, ground view 


Despite this variety of theoretical and practical undertakings, which proves the 
richness and significance of the topic of scale and the interest in it manifested by 
researchers from different areas of study, the potential and concrete application 
of this concept to new forms of analysis and knowledge production in digital his- 
tory and humanities are still largely unexplored. This book proposes a systematic 
discussion on the epistemological dimensions, hermeneutic methods, empirical 
tools and aesthetic logic pertaining to scale and its innovative possibilities resid- 
ing in humanities-based approaches and digital technologies. Taking a variety of 
viewpoints from scholars experiencing this notion in digital history and digital 
humanities, the edited volume gathers theoretical and application-related per- 
spectives, from microhistory and visual projections of historical knowledge in 
graphs to shot scales in television adaptations, scalable reading and cartographic 
zooming, and fosters reflections on the potential for novelty and creative explora- 
tion of the concept of scale when combined with digital humanities methods. 

By navigating through various themes considered in relation with the notion 
of scale, such as historical storytelling, online virality, literary computing, media, 
text and tool analysis, data-driven narrative and map modelling, the reader can 
learn about the variety of scales used within these different areas of research. 
Each chapter encompasses different degrees of theoretical assumptions, practical 
insights and middle-ground reflections, symbolically expressed through the three 
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conceptual levels: bird's-eye view, overhead view and ground view. While no gen- 
eral theory of scale is defined, the reader is offered the ingredients needed to 
build such theoretical constructs based on his or her own exploratory and sym- 
bolic journey through Zoomland. This variable-scale representation is therefore 
combined with four categories of enquiry or thematic realms that make up the 
territory of Zoomland: History, Media, Hermeneutics and Digital landscapes. 


3.1 History 


This thematic area offers different perspectives on the potential of the concepts 
of zoom and scale for digital history projects. e Through a focus on narrative 
history, Alex Butterworth engages the reader in an illustrative and speculative 
adventure. He combines reflections on semantic modelling and narrative practi- 
ces from existing projects, such as The Lysander Flights, Tools of Knowledge and 
Crimes in London, with designs for future graphic interfaces that bring together 
knowledge graph formalisms, filmic and narrative grammars and conceptualisa- 
tions of zoom defined along the detail, abstraction and cognitive axes, to support 
historical storytelling. e From a similar vantage point, Christian Wachter's ac- 
count of the democratic discourses in the press in the Weimar Republic outlines 
the conceptual bases of a methodological framework that elaborates on theoreti- 
cal and technical constructs such as the discourse-historical approach, corpus an- 
notation and analysis, and scalable reading, understood as a digitally assisted 
technique for pattern search and identification, as well as close inspection and 
interpretation. The proposed framework is therefore intended for both quantita- 
tive and qualitative analysis, as well as contextualisation and interpretation of 
discourse in historical research. @ Amanda Madden’s Menocchio Mapped fol- 
lows an intermediate-level line of enquiry that bridges micro- and digital history 
methodologies and narrative and quantitative approaches. It illustrates the topic by 
crossing various scales of analysis, from bird’s-eye view perspectives on the two 
fields of investigation to historical GIS projects, such as Mapping Decline, The Atlantic 
Networks Project and DECIMA, and vivid fragments with a micro-historical flavour, 
like a revenge poem from the diary of a 16th-century nun. ® While articulated 
within the same micro- and digital history setting, Mariaelena DiBenigno and 
Khanh Vos chapter adopts a different viewpoint that delineates a particular the- 
matic focus. In their study, centred on runaway slave advertisements in the United 
States in the 19th century, the authors argue that the shift from macro to micro al- 
lowed by digital technologies may illuminate how sources and data are used in his- 
torical narrative reconstruction and foster new means of historical storytelling, 
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ranging from nationwide accounts to the experiences of individual and marginalised 
communities. 


3.2 Media 


Within the second Zoomland area, the reader can explore the use of scale in 
media-related studies from a middle- and ground-view perspective. (4) Fred Pail- 
ler and Valérie Schafer propose an analysis of the history of online virality based 
on a range of examples from the early 1990s — Godwin's Law and Dancing Baby — 
to the more recent Harlem Shake and Distracted Boyfriend meme. The authors in- 
spect a series of visualisations and media inventories, from the more traditional 
press to social media platforms, and the way in which these phenomena spread, to 
devise an approach that intertwines scalable readings of content, encompassing 
spatial and temporal dimensions with a cross-media reading of context. (4) Adopt- 
ing a median approach between empirical and theoretical analysis, Landry Digeon 
and Anjal Amin develop a comparative study of shot scale patterns in two TV se- 
ries, American Law & Order: Criminal Intent and its French adaptation, Paris: En- 
quétes Criminelles. Their methodology explores shot scaling as a cinematographic 
device and carrier of meaning, while also using machine learning techniques, inter- 
cultural models and media, feminist and psychological theories to decode gender- 
and emotion-related televisual representations across different cultures. Two other 
media are examined in this section in relation to zooming and scale, this time from 
a bottom-up, implementation-oriented perspective. (+ In the first study, Nadezhda 
Povroznik, Daniil Renev and Vladimir Beresnev discuss forms of mediation that 
allow for zoom-in, zoom-out and zoom-zero modes, as well as the challenges inher- 
ent in the digitisation and virtual representation of religious sculpture from the 
Perm Art Gallery. While the possibility to switch between different zoom modes in 
a digital environment enables new ways of exploring and formulating research 
questions, the authors assert that digitising this type of cultural heritage should in- 
volve a deeper understanding of the cultural layers and contexts of existence of the 
sculpture, which may require additional knowledge regarding traditions and local 
customs in the object's region of origin, going beyond the reconstruction of its envi- 
ronment. (9) The second ground-view study deals with sonic scales, as described in 
Johan Malmstedt's computer-assisted analysis, which enables him to zoom in and 
out on a 1980s Swedish broadcasting dataset. It argues that scale and zooming may 
be related to various methods of detecting sonic diversity, such as differentiating 
between music, speech and noise, or several types of noise, and that they can help 
in tracing trends and developments over time in the acoustic style of broadcasting. 
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Analysis may therefore produce different results, depending on the type of zoom 
applied, either within the frequency register or along the time axis. 


3.3 Hermeneutics 


This third area pertains to overhead and ground standpoints on the capacity of 
zooming and scale to inspire and shape interpretative trajectories. ® By adopting 
a continuously scale-shifting perspective that evokes the size-appearance parable 
of the illusory giant Tur Tur, Chris and Raluca Tanasescu revisit concepts such as 
monstrosity and iconicity, anchored in the realm of intermedial and performative 
enquiries, to examine digital writing processes and illustrate their complexity 
through the case of the Graph Poem, a network of computationally assembled 
poems. The authors argue that the multi-scalar architecture of this type of poetry 
anthology, with algorithms operating at both a small scale, on poetry diction- 
related features, and a large scale, on network-topology-relevant criteria, coupled 
with a monstrous/iconic reflection filter, may inform more general considerations 
on the complex and often paradoxical inner mechanisms of text production, inter- 
connection and analysis in a networked, ubiquitous and ever-changing digital 
space. ® Benjamin Krautter’s four-dimensional reconstruction of Mueller’s con- 
ceptualisation of scalable reading and its application to the network-based analysis 
of a corpus of German plays represents another middle-ground exemplar of ap- 
proaching the question of scale through a combined theoretical and practical line 
of thought. The chapter unfolds as a detailed discussion and contextualisation of 
Mueller’s concept and the metaphor of zooming underlying it, followed by an illus- 
tration of how the various dimensions of scaling and a research agenda based on 
qualitative and quantitative methods can be brought together when analysing liter- 
ary texts. ® From the same intermediate perspective, combining theoretical and 
practical aspects in scale investigation as a heuristic instrument, Florentina Arma- 
selu advances the hypothesis that texts can be conceived as multi-scale conceptual 
constructs involving various degrees of detail, and devises a method for detecting 
levels of generality and specificity in a text, applied to analyse a selection of books 
from micro-global history, literature and philosophy. The proposed method integra- 
tes elements from topic modelling, fractal geometry and the zoomable text para- 
digm to build interpretations and visualisations of informational granularity aimed 
at capturing the dynamics of meaning that emerges from the assemblage of blocks 
of text considered at different scales of representation. © In the last chapter of 
this section, Stephen Robertson proposes a ground view of the construction of the 
digital history argument, as opposed to a print-based form of argumentation, exem- 
plified through the Harlem in Disorder project, a multi-layered, hyperlinked narra- 
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tive set up via the Scalar platform. According to the author, the project demon- 
strates how different scales of analysis and multiple threads of interpretation can 
be supported by the digital medium, enabling the user to understand the complex- 
ity of racial violence through the interconnection of a multitude of individual 
events, aggregated patterns and chronological narratives, which are more wide- 
ranging than could be contained in a book. 


3.4 Digital landscapes 


The fourth thematic group gathers viewpoints from all three perspectives and fo- 
cuses on the interplay between scale, tool-building and analysis in envisaging and 
perusing various forms of digital landscapes. @ Natalie M. Houston’s examination 
of three open-source network visualisation tools, Pajek, Cytoscape and Gephi, tested 
on the Les Misérables character interaction dataset, proposes a critical approach 
that analyses the default aesthetics offered by these software packages in order to 
grasp the meaning and knowledge creation assumptions embedded in the design of 
this type of tool. From a humanistic standpoint, the author argues that a critical 
awareness of the ways in which network graphs are produced and facilitate under- 
standing of data structures at a variety of scales contributes to more informed data 
visualisation practices that acknowledge the role played by aesthetic choices in the 
production of meaning. @ Quentin Lobbé, David Chavalarias and Alexandre 
Delanoë proceed from a similar bird's-eye vantage point to investigate the various 
ways in which the notions of scale and level are used in the digital humanities liter- 
ature. Using the theory of complex systems, a mathematical apparatus, and Gargan- 
Text, a free text mining software, they analyse a Web of Science and Scopus dataset 
and build visual representations of evolving branches of knowledge based on the 
conceptual distinction between level and scale understood in a more specific sense, 
as level of observation and scale of description, in the research and data analysis 
processes. Francis Harvey and Marta Kuzma adopt a mid-way approach to 
discuss and illustrate notions and techniques such as cartographic scale, zooming 
and generalisation in historical maps. After theoretical considerations and exam- 
ples, including the analysis of maps of Warsaw and the Vistula River from different 
time periods, the authors formulate a series of assumptions as to how future inter- 
pretative research in digital humanities may benefit from a better understanding 
of generalisation changes, differences between zooming and scaling, and their im- 
pact on the graphic representations in historical maps. @ The Weather Map pro- 
posal by Dario Rodighiero and Jean Daniélou offers a ground view of a visual 
model designed to depict public debates and controversy through the visual gram- 
mar of synoptic weather charts, and a Web-based implementation relying on the 
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Media Cloud archives and allowing for zooming and contextualisation through ac- 
cess to additional information, such as links to sources and statistics. According to 
the authors, this form of modelling enables users to capture controversy in the 
making and study the movements and plurality of actors that shape controversial 
events. 

Finally, this Introduction chapter outlined the conceptual framework and the 
variety of viewpoints related to the metaphorical and physical use of scale and 
zooming in digital history and humanities. Moreover, our intention was to pro- 
vide readers with the incentive they need to continue their journey through 
Zoomland and to discover and explore both its actual and its symbolic territory. 
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History 


Alex Butterworth 

& Adventures in Zoomland: Transitions 
in Scale and the Visual Exploration 

of Historical Knowledge Graphs 

as Sequential Storytelling 


Abstract: This chapter proposes a conceptual framework for the design of graphical 
interfaces to knowledge graphs that employ the concept of ‘zoom’, broadly defined 
to encompass levels of scale, resolution and abstraction - to enable exploratory 
forms of historical hypothesis formation and testing and of narrative elaboration. It 
does so through a process of illustrative and speculative reflection on a series of re- 
cent or current digital history projects in which the author has been involved that 
involve the semantic modelling of data representing sociabilities, mobilities, tempo- 
ralities and identity and which seek to encourage a hermeneutic sensitivity to evi- 
dential uncertainty. Reference is included to methods of knowledge graph creation 
and refinement, both automated and participatory, while the extended concept of 
‘zoom’ is located within its media and aesthetic history. 


Keywords: exploratory interfaces, digital hermeneutics, knowledge graph, scal- 
able data, historical narrative 


1 Introduction 


The increasing abundance of historical data in machine-readable form, the progres- 
sive refinement of methods for its analysis, and the popularization of innovative 
interfaces for its exploration, create the possibility of novel forms of historical nar- 
rative within digitally mediated environments. What follows in this chapter is a 
proposed framework for understanding and utilizing the concept of ‘zoom’ in the 
speculative design of graphical interfaces that enable exploratory forms of narra- 
tive history-making, which are driven by and embedded in semantically modelled 
knowledge graphs, grounded in formal ontologies. The framework is considered as 
a foundational requirement for the subsequent development of a narrative gram- 


Note: With thanks to Ed Silverton, Andrew Richardson, Sarah Middle and Duncan Hay for stimulating 
discussions during the design and development of practical experiments in the application of the 
approaches to narrative data visualization described in this chapter. 


8 Open Access. © 2024 the author(s), published by De Gruyter. LGS This work is licensed under the 
Creative Commons Attribution-NonCommercial 4.0 International License. 
https://doi.org/10.1515/9783111317779-002 
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mar in which the third meaning latent in transitions between modes of zoom ap- 
plied to configurations of knowledge can be effectively managed, to controlled and 
expressive effect. 

The concept of the knowledge graph will be expanded on in Section 2. Such 
graphs model historical data in a form which is highly plastic and fungible, en- 
abling the virtuous creative cycle of exploration and contribution that is envisaged. 
In combination with associated domain specific vocabularies and taxonomies that 
further organize the data contained in the graph, the interrogation of such knowl- 
edge graphs may generate complex historical insights. The filmic technique of 
Zoom, considered more closely in Section 5, may be broadly understood as a range 
of operations that vary framing and focalization, here applied analogously to the 
visualization of data within an explorable graphical interface: the Zoomland of our 
envisaged historical adventures. 

The conduct of historical enquiry and production of historical accounts that the 
Zoomland framework enables resonates with many aspects of the consideration of 
history-making offered by film theorist Siegfried Kracauer in his posthumously pub- 
lished work, History: The Last Things Before the Last (2014). History, he observes, is 
constrained by a “law of levels”, according to which, “the contexts established at each 
level are valid for that level but do not apply to findings at other levels”. Yet, grounded 
in his insight that *discerning historians aspiring to history in its fullest form favor an 
interpenetration of macro and micro history", he identifies a productive equivalence 
with the filmic medium in which, *the big must be looked at from different distances 
to be understood; its analysis and interpenetration involve a constant movement be- 
tween the levels of generality". Jolted by this movement across scales, the “historian’s 
imagination, his interpretive designs" are freed from the canalizing effect of data over- 
load, “inviting less committed subjectivity to take over" (Kracauer 2014: 76; 69; 70; 70). 
From this may emerge, Kracauer believes, *the historical idea" which, Rodowick notes 
in his critical consideration of Kracauer's thesis, *inaugurates a new terrain in which 
a wide variety of primary historical material may distribute itself and organize itself, 
illuminating previously unthought patterns of intelligibility" (Rodowick 1987: 117). 

Kracauer's work was written at the dawn of an era in which the potential of 
filmic zoom to reshape an audience's relationship to the process of fabulation 
was fully demonstrated by a new generation of directors, including arguably the 
greatest proponents of the technique, Robert Altman and Roberto Rosselli. Adopt- 
ing a fluidity of style that has been likened to improvisational jazz, Altman - ac- 
cording to Jay Beck - placed the audience for his 1970 film MASH “in an active 
spectatorial position, being asked to sift through the audiovisual information in 
order to follow one of the potential narrative paths" (quoted by Hall 2018: 26). A 
decade earlier, in his film recounting the nineteenth-century campaign for na- 
tional independence, Viva Italia, Rossellini had already applied zoom techniques 
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to specific history-making effect. Commenting on a scene in which Garibaldi’s 
men fight on a hill, John Belton notes that under Rossellini's direction, “Long shot 
equals then. Zoom shot equals now. The two shots in tandem are no longer lim- 
ited to an imitation of event. What we are watching is our own aesthetic and ideo- 
logical distance from the event” (Belton 1980/81: 24). 

In recent years, the notion of the Macroscope has become established to de- 
scribe a form of digital instrument by which the historian can similarly bridge 
and synthesise the big and the finely curated small data to which the digital turn 
has given rise. The benefits it may deliver have been variously conceived. For 
Tim Hitchcock, it makes possible the “radical contextualization” of individuals 
“by exploring the multiple archival contexts in which they appear or are repre- 
sented”; for Armitage and Guldi it enables, a “weaving together [of data] into one 
inter-related fabric of time”, while it allows Julia Laite, who quotes the preceding, 
to “construct a prismatic view of a complex phenomenon, to write a kind of his- 
tory in the round [. . .] [to] massively expand the synchronic and diachronic con- 
nections between people, places, and experiences" (Laite 2020: 965; Armitage and 
Guldi 2014). The proposed framework seeks to address those desires, reapplying 
the concept of zoom to an exploratory data environment in which scale is col- 
lapsed and “historical ideas" generated. It is a framework that supports a range 
of manifestations of that negotiated data, from a relatively abstract rendering of 
data points as graphically arranged nodes to immersive experiences, with the evi- 
dential sources, in print or image form, directly accessible. 


2 Uncertainty in the work of the narrative 
historian 


Whilst rooted in the observed needs of diverse historians and projects of histori- 
cal research, the Zoomland framework also foregrounds certain attitudes that I 
have adopted in my own work as a narrative historian: one whose aim is to pres- 
ent deeply researched historical accounts to a broad and non-specialist audience 
and readership. Among these is the aspiration to a sophistication in storytelling 
(or story-building) that equates to both the literary and the cinematic, with their 
vast but quite distinct toolboxes of tricks and sleights-of-hand and genres: techni- 
ques by which the relationship between percipient and story (Propp’s szujet, at its 
rawest) is mediated and manipulated into fabular constructions and complex nar- 
ratives (Propp 1968). 

This aspiration could be seen as problematically contrarian for an author of tra- 
ditional analogue narrative history, though perhaps more congenial in the realm of 
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digital history, since it involves a distrust ofthe very voice in which historical narra- 
tives are expected to be written for a non-academic readership. The nub of this un- 
ease is encapsulated for me in the editorial advice I repeatedly received from 
publishers to be ‘magisterially authoritative’. The implication was less that an au- 
thor should demonstrate a mastery of their subject, which of course entails a full 
awareness of the absences and lacunae and hermeneutic challenges, than that they 
should perform absolute confidence as a presiding narrator of the past. Tone, tense 
and genre were to be employed as mechanisms of gentle coercion: the use of condi- 
tional and subjunctive constructions should be eschewed, polyvocalism tightly man- 
aged, the knowable facts wielded as talismans. 

Both books I have published were written in fierce but subtle tension with 
such imperatives, favoring implication over forthright assertion, cherishing ambi- 
guity at their core. One book (Pompeii: The Living City) examined the social and 
political networks of a single, uniquely preserved Roman town as a microcosm 
for a period of imperial rule and Mediterranean trade and culture, interpreted 
through shards of epigraphical and material evidence; the other (The World That 
Never Was) looked at a late-nineteenth-century milieu of political conspiracies 
and subversion, in which the global diaspora that spread anarchist doctrine, and 
the international coordination of its policing, was discerned through the reports 
of informants to their secret service paymasters (Butterworth and Laurence 2005; 
Butterworth 2010). The books set out to prompt readers to engage with interpre- 
tive positions which might sometimes be uncomfortable or confusing: even leav- 
ing the reader suspended, temporarily, in a state of uncertainty, providing space 
for reflection on the evidential foundation of the narrative. 

Techniques for highlighting uncertainty were important when confronted 
with subjects that exposed the margins of confident knowability, dependent as 
my books were upon the reconstruction of fragmentary and unreliable historical 
and archaeological evidence, and demanding of speculative boldness. These tech- 
niques are, I believe, even more essential as touchstones when grappling with the 
design of digital engagements with the past, which rest on data whose biases and 
absences require constant attentiveness. This engagement must remain open, ex- 
ploratory and dialogic. In cases where exploration is heavily channeled by au- 
thoritative constraints, these will soon be felt as a deterrent by the user who 
craves the ability to pursue their own instincts of inquiry, frustrating their desire 
to probe beyond surface appearances and to test multiple configurations of data. 
Either that, or those constraints will have been so deeply insinuated as to co-opt 
the liberating process of enquiry itself to rhetorically manipulative ends. The re- 
sponsibility in designing such engagements is therefore social and political, as 
well as scholarly. 
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The proposed framework conceives of digital narrative history as a mode that 
embraces dynamism, dialogism and multiplicity. Authorship in this environment 
inheres in acts of expeditionary trace-making performed by each investigator of 
the dataset. The data text (a state sequence, as discussed in Section 10 onwards) 
that is produced by the investigator may, as they prefer, be deemed ephemeral and 
discarded, or else preserved and enmeshed with those of others; it may be support- 
ive of other trace-texts or it may contest them, in part or as a whole. The initiating 
author only proposes a path and rehearses the methods of its pursuit. These start 
with enquiry and analysis that is more or less informed by the existing knowledge 
of the investigator, both of the domain and the dataset, with each gradually accu- 
mulating. The investigator progresses to hypothecating and iteratively testing these 
hypotheses, then to the recording of insights, the presentation of argumentation, 
and its further elaboration as narrative. The narrative forms emerge from the 
steps and missteps of the investigation, each of which constellates the salient nodes 
in the knowledge graph, faceting their relations and elucidating their semantic sig- 
nificance. Any externally conferred authority is complemented by the collective 
recognition of skill in the dance of meaning-making, and the insights afforded by 
the traces that have been made. I will return in Section 7 to consider this interac- 
tion between the individual investigator and the collective acknowledgement of 
meaning-making. 


3 The projects referenced and their datascapes 


The conceptual framework proposed here derives from extensive and long-term 
investigations of user requirements across diverse historical datasets. It is illus- 
trated by reference to a number of recent digital history projects, primarily three 
with which I am currently, or have been involved. Those three projects (and their 
associated datasets) are The Lysander Flights, Tools of Knowledge and Crimes in 
London. Crimes in London was an experiment involving the use of machine learn- 
ing to explore the narratives of criminal activity and witness accounts in the Old 
Bailey criminal trials, and is distinct from the Old Bailey Macroscope created by 
colleagues from the Sussex Humanities Lab: a project which is also considered, as 
a key precedent for work in this area.! The Lysander Flights is a detailed study 
and exploratory narrative account of the operations in which the low-flying RAF 
Lysander aircraft flew secret agents and resistance leaders into and out of occu- 
pied France between 1940 and 1944: their conduct, context and consequences. 


1 *Old Bailey Voices", accessed February 18, 2023, https://oldbaileyvoices.org/macroscope.php. 
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The dataset on which it rests is compiled from minute-by-minute operational re- 
ports, and records of personnel and strategic planning, the organization of resis- 
tance circuits, and the executions of resistance members. Tools of Knowledge 
concerns the creative communities involved in the production and use of scien- 
tific instruments in Britain during the four-and-a-half centuries up to 1914, and 
the instruments themselves as encodings and vectors of both craft and scientific 
knowledge.” At its heart is a large and meticulously curated legacy database, the 
product of a single curator’s long career, which has been semantically remodeled 
and complemented with artefact data from multiple museum collections, as well 
as data from diverse sources that broadly contextualizes the entities involved: 
persons, places, objects, businesses and institutions. 

Taken together, these projects present a spectrum of possibilities for complex 
investigative and narrative methods - covering sociabilities, mobilities, temporal- 
ities and identity. As described in Section 3, the datasets on which they rest are 
semantically modelled, both in order to capture the greatest possible nuance in 
the data and to enable interoperability with other datasets. Each entity men- 
tioned above is defined as a node within a knowledge graph: each is potentially 
linked to every other entity type, as the data dictates. These linking edges are 
themselves additionally typed using controlled vocabularies to describe in detail 
the nature of the particular relationship, with each relationship characterized as 
an event. A social relationship between people might be defined as family, or 
business, or legal, for example, with a taxonomical account of finer grain rela- 
tions such as husband or brother-in-law (family), partner, apprentice or employee 
(business), prosecuting lawyer or co-defendant (legal). Each entity node is further 
characterized by its attributed properties: nationality or gender, for example, in 
the case of people; type, size or material for an object in a museum collection, 
with additional relevant and domain specific attributes of call sign and engine 
type for a Lysander aircraft. 

Wherever possible, the relations between entities are themselves modelled as 
events, which may involve actors (animate or inanimate, as individuals or as- 
semblages), locations, activities and times. A crime is modelled as an event, as is a 
wartime operation, but so too are the atomized constituents of each of these: a 
button accidentally left at a crime scene, an encounter with anti-aircraft fire over 
the city of Tours. All of these entity nodes carry properties of their own, which 
include information about their evidential basis or provenance, and metadata 
about the reliability of the data. The automated geocoding of a placename against 


2 “Tools of Knowledge, Modelling the Creative Communities of the Scientific Instrument Trade, 
1550-1914”, accessed February 18, 2023, https://toolsofknowledge.org/about/, (AHRC AH/T013400/1). 
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a gazetteer might have a probability of 78% and may or may not have been 
human-authenticated, a time might be circa, or represented by a span of uncer- 
tain start or conclusion (“born before”, “died by”), or it might be a spot date stand- 
ing in for a span (working life: only record, 1756). The qualitative and quantitative 
characteristics of the data vary, and with these their tractability to more abstract 
or concrete representation, with the potential for media that actualize the former 
being linked in to the datasets. This modelling of the historical data will be use- 
fully born in mind in relation to what follows. 


4 How to make a knowledge graph 


Knowledge graphs are a form of structured knowledge representation, derived 
from potentially multiple data sets and sources, and modelled at various levels of 
refinement and authority, according to the interests of those who have contrib- 
uted to their construction and content. Varied pipelines composed of methods 
and tools support more or less automated processes to extract the data and then 
transform and load it (ETL), or extract and load the data and then transform it 
(ELT): the sequencing of the phases determined by a combination of source type, 
analytical needs, and storage capacity for processing. The components of the pipe- 
lines range from Natural Language Processing methods - including Named Entity 
Recognition or Cluster Labelling approaches that may leverage word or concept 
embeddings to capture semantic nuance - through to semi-automated tools such 
as OpenRefine. At the time of writing the rapid emergence of Generative Pre- 
Trained Transformers and experimental tools such as GPTIndex promise even 
higher levels of automation. 

At its most refined, the resultant knowledge graph is aligned with a formal 
schema, which is itself reconciled where possible with published ontologies, in- 
cluding those for source citation: CIDOC-CRM, BIO CRM, VIAF, Fabio, Citeo, etc 
(potentially using OpenRefine's proprietary near-cousin, OntoRefine, or equiva- 
lent tools).* Beyond this, specific interest groups that contribute rich and densely 
modelled data, generated by them during a tightly focused project of research, 


3 "Welcome to the GPT Index", accessed February 18, 2023, https://gpt-index.readthedocs.io/en/lat 
est/. 

4 “What is the CIDOC-CRM?", accessed February 18, 2023, https://cidoc-crm.org; Jouni Tuominen, 
Eero Hyvónen and Petri Leskinen, *Bio CRM: A Data Model for Representing Biographical Data 
for Prosopographical Research", accessed February 18, 2023, http://www.sparontologies.net/ontol 
ogies/fabio; https://sparontologies.github.io/fabio/current/fabio.htm]; https://sparontologies.gi 
thub.io/fabio/current/fabio.html, accessed February 18, 2023, https://viaf.org. 
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may either embroider extensions to this core “authority' schema, or devise novel 
ontologies for their own immediate interest. As Beretta has explained in relation 
to the symogih.org project, identifying those points of overlap and concurrence 
enable interoperability, and allow immediate semantic mappings to be made be- 
tween the core graph's nodes and those populating its hinterland (Beretta 2021). 

Stepping back one degree more in the graph, we find ourselves further dis- 
tanced from rigid formalizations and edging towards the folksonomical and emer- 
gent: bottom-up terms and concepts, linked to the schema only by means of a fuzzy 
ontology' that accommodates them loosely. This material, which could encompass 
free-text comments contributed by Zoomland explorers or digitized historical texts 
or transcripts of audiovisual account, may in turn be computationally annotated, ex- 
tracted, abstracted and modelled, using similar methods to those discussed. These 
processes can preserve the micro-graphs representing, in simple terms, a set of enti- 
ties linked by their description of an event. At a higher level of abstraction, they 
may also capture those semantic vectors drawn together by the strength and density 
of their connecting edges to reveal concepts that have clustered according to some 
semantic affinity, within the high-dimensional space generated by the representa- 
tion of the myriad word associations derived from textual sources. The folksonomi- 
cal free text source may therefore be conceived of, in graphical terms, as loose 
constellations of nodes floating around the ever more densely modelled data to- 
wards the graph’s core, spatially distributed as it would be by a force-directed layout 
algorithm. Constellations that are barely tethered at the periphery may be more 
closely integrated to the core, through the step-by-step linkage of nodes by means of 
newly minted edges. The tasks involved in the identification of these edges — as can- 
didature for semantic interest and contextual relevance — may be performed, in 
turn, by a combination of machine learning methods and the contribution made by 
those human explorers of the knowledge graph who are drawn to assist in its devel- 
opment at a more artisanal level. 

To most effectively support a virtuous cycle of contribution, the nature of the 
user's engagement with the graphical interface, by means of which they can ex- 
plore the semantic content of the graph, should include exploratory reward and re- 
motivation. The process of contribution should be well considered but relatively 
frictionless; whatever friction is encountered should result from an intentional 
choice on the part of the explorer to deviate into a more conscious engagement 
with the data, or else it should be algorithmically-generated as an injunction to 
heightened criticality, based on an assessment of the validity of the contribution. 
Within modes and layouts, however, the operational mechanism for informational 
and cognitive adjustment is here conceptualized by analogy with cinematic ‘zoom’, 
as it has variously been deployed. 
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5 A media archaeology of zoom: Digression 
and immersion 


So, what is meant here by “oom' and where does the concept sit in a hermeneu- 
tics of data exploration and emergent digital storytelling? In traditional cinematic 
terms zoom is, at its simplest, a change in the focal length of the lens which, in 
contrast to a camera move towards a subject, retains the relative size and signifi- 
cance of the subject and background, while narrowing the view-shed. It signals 
attentive interest, but is only a tentative step towards recentering that interest 
which a camera move would have advanced a step further, and with a subjective 
focalization offering the most complete recentering, by means of an edit or cut. 
Whatever the zoom’s velocity and extent — a short whip or deep crash; buried and 
contemplative — its effect is to generate a subliminal sense of provisionality. Like 
the suspended moment of evidential uncertainty mentioned in Section 3, it offers 
an instant of self-reflection, in which the viewer wonders where their interest 
might next be led: whether inward to the soul of the subject, or outward to con- 
textualize its situation, or even displaced onto some other subject which will be 
viewed relationally to it. Such is the potent effect of Zoom in cinematic form on 
the spectator. 

Fixed in an authored audio-visual or graphical sequence, the speculation en- 
gendered by the zoom as by any other composition or camera move, according to 
the skill of its construction and deployment, propels narrative engagement. For 
us, in seeking to define the design requirements for an exploratory Zoomland, it 
might translate into a moment that informs and compels a more active expres- 
sion of agency in what the next configuration of visual information will be. How- 
ever, for such progressive choices to be most appropriately informed, the fixed 
relationship of subject and background that the zoom ensures may benefit from 
inflections which at least hint at context, or even by previews of those next steps 
and what they might reveal. The experience here might be conceived as similar 
to the liminal zone of peripheral vision, where an apprehension of movement or 
change in light intensity is sufficient instinctually to prompt further investigation. 
So, for example, the visualization which, on completion of its putative zoom, 
would switch fully from one faceting of the knowledge graph to another - to a 
looser temporal granularity, perhaps, or tighter geospatial resolution — here sur- 
faces only those entities algorithmically predicted to be most salient to the ex- 
plorer. How entities are selected for this proleptic function and how they are 
evoked will vary by the circumstantial configuration of the interface. 

Although the most conventional understanding of zoom - as a tightening of 
view-shed and loss of peripheral vision — may confer benefits in certain Zoomland 
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scenarios, it should not preclude other applications analogous to the technique. 
Nor should the interactive application of zoom be inflexibly reverential to its ori- 
gins in cinematic or televisual media, where its practical use has been limited and 
often modish. Rather, the core mechanisms of zoom may be beneficially modified 
for this novel grammar of Zoomland storytelling in combination with other techni- 
ques for the manipulation of the cinematic image and their cognitive effects, such 
as dollying moves (in or out) that either complement or contradict the zoom. 
Changes in depth of field can blur or sharpen background or foreground, shifting 
attention between them or altering their respective resolution to suggest varying 
relationships or impressionistic affinities. The lenticular effects of pulled focus may 
also redirect us away from an individual subject to the social or collective: a notion 
developed, with different intent, in the theory of ‘lenticular lenses’ applied to the digi- 
tal analysis of text? By embracing an expanded definition of Zoomland's media ar- 
chaeology, all these translated affordances of cinematic zoom become available for 
use in varying permutations, sequenced to produce expressively nuanced results. 

Of equal relevance and complementary to such cinematic precedents, though, 
for current purposes, are insights drawn from the great theorist-practitioners of re- 
alist and modernist prose fiction: the potency of digression in Hugo and the layering 
of social and spatial networks; the densely evoked cognitive realism of Balzac, with 
its accumulation of persuasive detail; Tolstoy's eagle eye that discovers relevance in 
and between Napoleon's cold and the chaos of the battle of Borodino, as Carlo Ginz- 
burg has referenced with regard to the macroscopic ambitions of micro-historical 
practice (Ginzburg 2012, 209). All are suggestive of principles for novel modes of sto- 
rytelling with the historical data, which will refine the ideal interface design frame- 
work envisaged. This is particularly so for the more abstract levels of zoomed 
visualization at which it is the identification of patterns in the data that approximate 
to contextualizing story beats. The distribution of pickpocketing in eighteenth cen- 
tury London viewed in relation to that of the watch houses that facilitated its polic- 
ing, for example; or audits of stock held by an instrument maker, derived from 
probate auction catalogues, with the dimensions of instruments, individually and 
collectively reckoned against the likely square footage of the display and storage 
space in their commercial premises; or the intensity of a Lysander pilot's flight re- 
cord correlated with pilot-related accidents. 

Examples such as these may be evocative as well as analytically revealing, 
even when encountered through the abstract visualization of relevant data. How- 


5 Al Idrissou, Leon van Wissen, and Veruska Zamborlini, “The Lenticular Lens: Addressing Vari- 
ous Aspects of Entity Disambiguation in the Semantic Web" Paper presented at Graphs and Net- 
works in the Humanities, Amsterdam (3-4 February, 2022). 
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ever, to envisage the framework for an exploratory historical storyworld exclu- 
sively in these terms would be negligent, at a moment when the dominant media 
and genres for storytelling are themselves immersive: in their photo-realistically 
rendered virtuality, their kinetic preoccupations, and often in their exploratory 
and constructionist affordances too. Indeed, the digital capture of physical objects 
and environments as point clouds, whether through photogrammetry of instru- 
ments that are representative of those from probate auctions, or laser scans of a 
crashed Lysander’s cockpit, or extruded maps of historical London ground plans, 
raises the prospect of further negotiations between data models, of quite different 
scales and semantics. The prospect afforded is one that makes digitally manifest 
Chekhov’s widely paraphrased and abbreviated advice to the short story writer: 
“Don’t tell me the moon is shining; show me the glint of light on broken glass” 
(Yarmolinsky 2014: 14). In such cases, zoom could serve an important mediating 
and interpretive function, bridging the concrete and the abstract, to simulate the 
situated immediacy of lived experience, albeit in a form that must be responsibly 
hedged around with caveats and correctives. So, whilst the suggested framework 
does not countenance the integration of historical data exploration and ludic en- 
vironments, I would argue that it should encompass notions of both “playability” 
(with dual senses of “play”: as rehearsal without formal constraint, and of the 
scope for minor movement within complex structures or in mechanical motion) 
and of immersion. 


6 The idea of scale - the three principal axes 


Before progressing further through the speculative design of frameworks for 
Zoomland visualizations, it will be useful to consider briefly the most familiar af- 
fordance to which principles of zoom are applied: that of scale. At every turn in 
their work, the historical researcher must confront the question of the most ap- 
propriate scales at which to scope their subject, or into which their subject may 
usefully be decomposed. At what level of scale in the data available to them is it 
possible to detect change and assess the range of causal effect, and with what con- 
fidence, as determined in part by the quality and representative claims that can 
be made of that data. Following directly from this, and further complicated for 
historians working in a primarily narrative mode, are questions regarding the 
methods by which any decompositions can be recombined, and the validity of the 
insights produced by those recombinations. Zoom, in a straightforward conceptu- 
alization, allows experimental movement and comparison between the general 
and specific, the whole and the component. For the conscientious expert historian 
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its products will prompt not only further questions and hypotheses about how 
each informs the other, but also an implicit problematization of those relations 
and the assumptions that underlie them. 

With its remit to rehearse investigative expertise for a broad public, and to cul- 
tivate their capacity for such data exploration, the Zoomland interface must go fur- 
ther than merely making visually accessible such straightforward zoom-in-scale. It 
must additionally surface these processes, reifying the dialectic interplay between 
them through the presentation of data that is faceted as contextually apposite, at 
any point. The scales of data involved in the exemplary projects discussed in this 
chapter, considered in terms of volume and complexity of modelling or detail, 
exemplifies a broad range of possible historical datasets. As such they also illumi- 
nate issues in the management of faceted data visualization and how these mecha- 
nisms may serve to bridge between one scale and another: as understood most 
directly in terms of spatial and temporal data, and within the graph of actor- 
relations. Furthermore, they should accommodate varied modes of critical interro- 
gation: visual (pattern-matching), speculative (causal hypothesis-forming) and emo- 
tional (situated). 

To address these requirements, the proposed framework for zoomable explora- 
tion is conceived as comprising three principal axes, each interacting with a contin- 
uous scaling, calibrated around three recognized scales of study (micro, meso and 
macro) and three core modes of visual organization (spatial, temporal and network). 
The three axes which variously characterize the conceptualization of zoom are de- 
fined as: (1) Level of Detail, involving the nature and amplitude of information ap- 
propriate to the current zoom level; (2) Level of Abstraction, from statistically 
evaluative to situated and immersive, or critically reflective to emotionally co-opted; 
(3) Cognitive, by which the depth and faceting of contextual information is focalized. 
A fourth potential axis, concerning authority and certainty, will be mentioned addi- 
tionally in Section 12. ‘Level’ is used here, in relation to all axes, to describe the level 
of zoom that is set at any moment, along the range of possible settings. 

Zooming on the Level of Detail axis, calibrated for informational resolution, is 
analogous to the illusionistic methods by which the developers of immersive envi- 
ronments switch in and out images or models, as the zoom reaches different pixel 
or polygon counts, as imperceptibly as is technically possible within a smooth ani- 
mation, to avoid the lag that may be caused by loading overlarge image files rela- 
tive to available processing power. It may also be imagined as an extrapolation of 
the methods employed on zoomable digital maps, where the most zoomed-out view 
emphasizes topography and transport infrastructure, for the purposes of long- 
distance route planning; while the most zoomed in exposes infrastructure for pe- 
destrian way-finding; with an intermediate level which highlights resources or 
points for interest that may localize the interest of the former user, or expand the 
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horizons (or advertise to the consumerist needs) of the latter. In this respect, it be- 
gins to intersect with Cognitive Zoom. 

This third axis of zoom, Cognitive Zoom, has the greatest novelty as a principle 
for visualizing data and also involves a greater complexity of dynamic calculation 
in transforming the graphed data according to rules customized for individual da- 
tasets. It may be understood most straightforwardly as the constellation of context 
that is deemed most revealing at any moment in relation to the immediate objec- 
tive of the exploration. Or in presentational terms, the context that the viewer 
needs to know at this point on their narrative journey to make fullest sense of 
what follows. Its close antecedents lie in the work of Marian Dórk, his formal con- 
ceit of the *information flaneur", and the visualization styles prototyped in his 
work on monadic exploration. This draws on Tarde's concept of the monad to chal- 
lenge the distinction between whole and part, while more recent collaborative 
work progresses to the concept of the ‘fold’, the latter elaborating the former as a 
topological model that exists in a continual state of dynamic folding of space, move- 
ment and time (Dórk et al. 2011; Brüggemann et al. 2020). For present purposes, 
though, the concept is animated rather through reference to filmic technique - the 
dollied zoom - by means of which the central object of interest is simultaneously 
approached focally and actually, with the foreground held steady while the back- 
ground is advanced or, in this case, made informationally salient. 

Level of Abstraction is probably the most readily understood as being, at one 
extreme of the axis, photorealistic immersion whose verisimilitude produces a 
sense of situated presence, potentially involving haptic, tactile and olfactory senses 
in addition to the visual. It is, or might be conceived, crudely, as the ultimate histor- 
ical theme park experience. More difficult to envision is the other extreme of the 
axis where the information contained in the knowledge graph into which the data- 
set has been modelled is expressed at its rawest: a dense tangle of nodes and edges 
that exist only in a condition of illegible high-dimensionality. As discussed in Sec- 
tion 12, for whatever purpose this is thought tractable to human interpretation, this 
multivariant data must be reduced to the simplest form at which it can be visual- 
ized, in two or three dimensions (or possibly four, if some means of filtering by 
time is included). Between these poles of the axis lie innumerable possible calibra- 
tes: progressing away from the highest point of abstraction, at each of these the 
visualized dimensionality of the data is further reduced, by means of the applica- 
tion of algorithmic or graphical filtering. Conversely, we may approach this from 
the concrete extreme of the Level of Abstraction axis, by which the relations con- 
cealed in the data that drive the immersive simulacrum with which the participant 
is first presented are incrementally decomposed for analysis. Here again, intersec- 
tion with the axis of Cognitive Zoom is likely to occur, whilst intersection with the 
Level of Detail axis of zoom is also possible. 
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7 How authority is conferred, indicated 
and deployed 


For the user, the drift towards Abstraction in this interplay of axes is liable to 
entail a gradual weakening of those points of intellectual and imaginative pur- 
chase derived from immediate human experience, and a drift towards ever 
higher levels of abstract conceptualization, that invoke and require more schol- 
arly domain knowledge. An immediately comprehensible example of this might 
relate to time and human memory. The experience of a day or a week or a month 
or a season is familiar to a contemporary reader or data explorer and may be 
tacitly invoked to afford points of identification with historical experience. The 
temporal divisions, rhythms and functions of these time spans are, of course, 
often radically different according to context, historical and otherwise, as re- 
flected even in their naming: the exigencies or planting and harvest, of economic 
behaviors dependent on sailing seasons, of fasti and nefasti or feast days and fast 
days, of lives lived by daylight or by standardized railway time. Nevertheless, the 
shared experience of time at these scales can provide imaginative purchase on 
that difference, which an author may more or less explicitly evoke. 

It is this potential to analogize, however approximately, which is drawn on by 
techniques such as the graphical distortion of an otherwise uniform segmentation 
of time to communicate more subjective temporal experiences - intensity of activ- 
ity, the anxiety of waiting - in relation to timeline visualization. Career courses, 
lifespans, even generational shifts are similarly amenable to an analogizing ap- 
proach, to produce an immediacy of narrative engagement. In static forms of data 
visualization, convention insists on regularity in the relationship between any 
property and its spatial distribution or proportional scaling, with any deviation 
from this rule considered as irresponsible rhetoric which risks misleading the in- 
terpreter. Such techniques may be afforded a different legitimacy, however, where 
the information space is itself recognized as a dynamically expressive environment. 
In this case, the relationship between background and foreground can itself be ex- 
perimentally adjusted by the configuration of zoom settings, on any axis, while 
comparison between sequential animated states makes apparent any rhetoric devi- 
ces. As when toggling between maps that represent geographies in Euclidian space 
according to conventional projections, and cartogram representations that morph 
according to the geospatial distribution of additional data, the agency of the ex- 


6 Alex Butterworth, *On the growth of my own mind: Visualising the creative process behind 
Wordsworth's autobiographical epic, ‘The Prelude’, in context", accessed February 18, 2023, 
https://searchisover.org/posters/butterworth.pdf. 
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plorer and the perception of narrative development converge to generate imagina- 
tive engagement and suggest new interpretations (Dorling 1996). 

The effect of such techniques becomes increasingly etiolated, however, as the 
temporal duration expands. When we reach the level of historical periodization, of 
longue durées and epochal change, an effective historical hermeneutic is likely to 
depend on specialist scholarly frameworks, often particular to a field or domain. The 
application of these collectively tuned knowledge-producing practices may reveal 
significance, suggest hypotheses and even bring to the surface provisional narratives 
in the patterning of data which to a layperson will remain mute and perplexing. The 
design of visualization layouts at such a level of zoom on the Abstract-Concrete axis 
must, it could be suggested, primarily serve the requirements of these more sophisti- 
cated users, even at the risk of presenting a complex and deterrent appearance to 
the non-expert. A visual form that is as simple as a scatterplot, such as that produced 
by Hitchcock and Jackson for their Old Bailey Macroscope, with its axes representing 
number of words per trial and sequential trial references numbers, is interpretable, 
even in the most basic terms, only with implicit knowledge of file naming conven- 
tions (and their relationship to chronologies, at micro and meso scale), as well as 
changes in methods of court recording or relevant legislation. A deeper analysis of 
change in trial length over a period of centuries might require more “microscopic” 
knowledge of courtroom architecture and its effects, and/or an instrumentalized un- 
derstanding of the impact on judicial practices of changing carceral regimes or the 
imperatives of the political economy. 

An analogue author of a narrative history can more easily be inclusive. In the 
course of ostensible story-telling they may educate the reader almost impercepti- 
bly, skillfully seeding and integrating the necessary information through subtle 
and contextually apposite digression or allusion. Interfaces to data-driven zoom- 
able analysis can and should seek similarly to mitigate the obstacles to properly 
informed interpretation by the broadest possible range of users. It is something 
that Hitchcock and Jackson's macroscope attempts, generously and engagingly, in 
relation to the example of courtroom architecture mentioned, through the provi- 
sion of a three-dimensional visualization of the Old Bailey courtroom (styled for 
one historical arrangement, although potentially in multiple period-accurate 
manifestations). Within this virtual space, users may position themselves to emu- 
late and gain analytical insight from the experience of witnesses, lawyers, jury, 
judge or the accused as they were spatially situated in an environment that em- 


7 Tim Hitchcock, *Big Data, Small Data and Meaning". Historyonics, 9 November 2014, accessed 
June 17, 2023, http://historyonics.blogspot.com/2014/11/big-data-small-data-and-meaning_9.html. 
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bodies and demands an almost theatrical performance of roles from those partici- 
pating in the courtroom proceedings. 

As will already be apparent from these relatively simple examples of the 
highly varied states into which data can be configured, all of which would con- 
form to the framework for a Zoomland info-graphical environment, there is a 
high risk that the data explorer will experience both visual and cognitive dis- 
orientation, unmoored from reassuringly familiar conceptual models. The simul- 
taneous availability of multiple options and functions for changing position, 
layout, filters or parameters is a necessity for freedom of enquiry but offers no 
firm footholds. It is therefore imperative that the explorer be best equipped to 
make meaningful progress, which here implies a forward movement through se- 
quential meaning-making. Which bring us to the principle of the subject-in-transit, 
introduced here as a necessary complement to the earlier described principles of 
zoomable axes. 


8 The subject-in-transit 


The principle of the subject-in-transit proposes a mechanism for the multi-factorial 
means of ensuring a situated focalization while exploring the Zoomland datascape 
of an historical knowledge graph. The subject-in-transit comprises a single-entity 
node - a person, object, place, moment in time - or could involve a set of nodes 
which constitute and are contained within an event entity. This node provides a 
persistent object of attention travelling through varying states of data selection (by 
means of filtering and faceting) in combination with a range of graphical layouts: 
until a deliberate decision is taken to transfer focus, the node retains its identity 
through any transition in graphical presentation. Different permutations of selec- 
tion and layout, which may be adjusted experimentally or with particular inten- 
tion, enable varying interpretive possibilities to be discovered in the graphed data. 
Nodes may be laid out according to time or spatial properties, or in a network 
form, with each subject to rules of reorganization triggered by passing designated 
points in any ofthe axes of Zoom. 

In simple terms, nodes might stack or aggregate on a Level of Detail Zoom, 
with the aggregation determined variously by polygonised proximity (map), tem- 
poral subdivision (hours, days, months, years, decades), or different gauges of 
network clustering and affiliation (membership of a guild, apprenticeship to a 
master; frequency of association with a particular crime or area of London; in- 
volvement in a resistance network or military organization). On the Cognitive 
axis, nodes linked to the subject-in-transit in the graph - directly or indirectly, 
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though only via edge types which are dynamically determined to reflect contex- 
tual semantic relevance to the current layout - may be exposed and constellated 
to suggest new analytical perspectives for the user, with the degree of allowable 
graph traversal determined by the adjustable Zoom level that is set. In this way, 
related people, ideas or objects (spatial/network) or locations and events (spatial/ 
temporal) are displayed, to a greater or lesser extent, and according to further 
dynamically determined principles of relevance. The operation of Zoom on the 
Abstract-Concrete axis cross-references the type of data and associated media 
that are available with the layout type to define the mode of presentation: in spa- 
tial layout mode, the zoom might run from organizing nodes into an abstract bub- 
ble graph visualization according to type (country when in combination with a 
low Level of Detail, parish when high) through to the vertical extrusion of a his- 
torical map underlay at one extreme of the axis, or at the other, an explorable 3D 
representation of the street in which the subject-in-transit was located at that mo- 
ment in historical time. Matrices that define how nodes manifest visually in each 
permutation or Layout, Axis and Zoom level are pre-configured in the course of 
data preparation and management. 

Such an approach is consistent with a mode of historiographical knowledge 
production that seeks to induce reflexivity, and which embraces zoom as a means 
to engage and shift affective identification. It enables more nuanced engagement 
with the polyvocality, uncertainty and ambiguity generated by the tensions across 
scales of data, while enabling narrative devices that may challenge the user- 
actant with the unrecognized biases encoded in their exploratory choices. Draw- 
ing on Marian Dórk's theorization of the *information flaneur" as an imagined 
percipient casually constellating their information environment, the principle of 
the subject-in-transit emphasizes instead the chosen ego node within the knowl- 
edge graph as a sustained guarantor of coherence (Dórk et al. 2011). 

Hitchcock and Jackson's The Old Bailey Macroscope has already sketched an 
application of the principle, although not explicitly elaborated as such or grounded 
in a knowledge graph. Within the Macroscope, the user is able to switch between 
layout views while maintaining a persistent focalization on a single subject-entity, 
in this case the trial. Consequently, the hearing that has been presented in immer- 
sive form may also be identified and highlighted in the scatterplot, or vice versa, 
the 3d courtroom may be accessed via the selection of a single scatterplot glyph. It 
may also be examined, singly or comparatively, as a sequence of standardized 
courtroom discourse elements, derived from a dataset created by Magnus Huber 
and here presented as a sequence of strips on a horizontal bar graph, color-coded 


8 “Old Bailey Voices", accessed February 18, 2023, https://oldbaileyvoices.org/macroscope.php. 


36 — Alex Butterworth 


by discourse type and linked through to the transcript of the trial.” Whilst some- 
times available in a dashboard form, meaning is inherent more in the counterpoint 
of layouts, yet the transition between layouts is abrupt rather than fluid. That is, 
when compared with filmic grammar, it aligns more closely with a “cut” than an 
animated zoom, although the shifts in scale certainly embody something of the lat- 
ter effect, albeit with a loss in selectively carried over context. The Old Bailey Mac- 
roscope therefore offers a highly revealing prototype and illustration of zoom, 
notably on the abstract-concrete axis, with two distinct explorable visualizations of 
data that sit respectively towards — but not quite at — its two extremes. 

These identifiable lacunae in the Old Bailey Macroscope are, however, them- 
selves usefully suggestive of challenges and opportunities. The scatterplot layout posi- 
tions its visual representation of data on axes that carry human-readable meaning, 
albeit requiring implicit expert knowledge to interpret, while the 3D courtroom sce- 
nario intentionally eschews (even alienates) the user from a sense of immersion: the 
space is not “skinned” in surface detail, the human figures are near-skeletal ma- 
quettes; the voicing of the trial is machine-generated, uninflected and not distin- 
guished by speaker. It is a project that leaves the dramatization to the television 
fictions that it has informed. However, the extreme of the abstract-concrete axis 
might allow for further progress in that direction, which in turn could activate a 
data-orientated negotiation between the levels of evidential certainty (Section 11) and 
the speculative affordances involved in immersive evocation. Moving from the 3D 
courtroom, along the axis in the other direction, towards abstraction, it has not yet 
tested either the territory of meso-level abstract-concrete zoom where, for example, 
the visual interface might allow for the experimental correlation of trial duration 
data directly with modification of courtroom design, expressed perhaps in more dia- 
grammatic terms of space syntax and topic-annotated graphs of courtroom dis- 
course. Nor does the Old Bailey Macroscope venture to the further extremes of 
abstraction, of the underlying multi-dimensional knowledge graph, in whose visual- 
ization even the guide ropes of expert knowledge must be cast aside in favor of the 
emergent, if imminently inchoate: one whose analytical potential will be touched 
upon in Section 12. 


9 The data finds its shape within the framework 


For the three projects which I have discussed as exemplary case studies for a pro- 
spective Zoomland approach, however, the balance of considerations is different. 


9 http:;//hdl.handle.net/11858/00-246C-0000-0023-8CF, accessed February 18, 2023. 
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In two of these, the scale of the data is somewhat or substantially smaller, in 
terms of the total sources, indicating the aforementioned focus on meso and 
micro scales of zoom on most axes, but the detail of the modelling substantially 
greater, while its semantic grounding amplifies the structure of the graph itself, 
as a carrier of meaning. In the third case study, coincidentally developed from 
the Old Bailey dataset, the data in question is derived from textual analysis at 
scale and is loosely semantically modelled, and for purposes of zoomable analy- 
sis, roughly compatible with that of the other two projects. The effect envisaged is 
to increase the amplitude of analysis along the Abstract-Concrete axis, most nota- 
bly in combination with Cognitive Zoom, by which the Subject-in-transit can be 
contextualized. 

A recent and highly effective use of Virtual Reality for historical evocation, 
consonant with the Second World War in the air subject matter of the Lysander 
Flights, was 1943 Berlin Blitz, produced by the BBC.” The experiment is situated 
in a Lancaster Bomber as it flew in formation through heavy flak to rain incendi- 
ary devastation on the German cities below. The experience creates a powerfully 
immersive visual and physical effect, with its emotional impact intensified by the 
use of a voice over contemporary report by Wynford Vaughan-Thomas, a radio 
journalist who accompanied a mission with his sound engineer. For fifteen mi- 
nutes, in an abbreviated version of the eight-hour journey, the lived experience 
of the terrified crew is reanimated. Its position on the Abstract-Concrete axis is 
clear, with the voice of the reporter, equipped with contextualizing information 
(carefully censored and skewed towards propaganda), fulfilling something of the 
function of Cognitive Zoom. How though might Cognitive Zoom, deployed at a 
more abstract point on the axis, generate an equivalent intensity of understand- 
ing and even empathy by revealing patterns in the data that evoke the inner 
world and thought processes of subject's situated experience? 

The subject-in-transit here might be, by closest analogy, the pilot of the Lysander 
aircraft on an equally hazardous operation to infiltrate agents into occupied France. 
We might be introduced to the interiority of the experience by visualizing data that 
reveals the intensity of their schedule of sorties, how much recovery time they have 
enjoyed between, or how much leave they have taken in the last month, from which 
we might derive the imagined level of stress and fatigue; or the same question might 
be approached through how many crews' and colleagues' names they have watched 
wiped from the blackboard in the previous forty-eight hours of 1943 as casualties on 
similar missions (by way of reference to a common visual trope of combat films). 


10 “1943 Berlin Blitz in 360°”, accessed February 18, 2023. https://www.bbc.com/historyofthebbc/ 
100-voices/ww2/360berlin. 
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Or it could be a criminal defendant, or an instrument maker, or even a scientific 
instrument. What, we might ask, are the chances of conviction for a girl of sixteen 
brought to the Old Bailey in the 1760s for the theft of a gentleman’s handkerchief in 
a theatre, what is the likely sentence they will receive, and in what language are 
they most likely to hear themselves described? Or for a young Italian immigrant 
maker of chronometers who has just won a silver medal in the Greenwich Observa- 
tory trials of 1854, coming in behind a prominent London-based company with a 
three-generation lineage of supplying the admiralty, how many chronometer makers 
who have been in a similar position to him still have a going business two decades 
later. In summary, is this a world hospitable to talented incomers and, if so, what 
are the secrets of success? Perhaps that is a question for which the answer is carried 
by an object rather than a person: a chronometer that has been in service for deca- 
des, was assembled from parts supplied with recognition by half a dozen artisanal 
enterprises, with the elegant housing made of an unusual alloy of unknown origin 
detected by X-ray Fluorescent analysis, was stamped by an eminent Edinburgh 
maker, and which had been regularly repaired by the same workshop twice a year, 
before its retirement from Admiralty service and acquisition by the National Muse- 
ums of Scotland, its eminence secured by careful curatorial preservation and thanks 
to its regular exhibition. Yet how truly representative is it of the hundreds of similar 
instruments supplied to the Royal Navy’s ships throughout the nineteenth century? 


10 Narrative exploration as hypothesis forming 
and testing 


For the data to be so revealingly articulate, of course, it must be sifted and sorted, 
filtered and faceted using compound Boolean arguments, and rendered legible to 
others as a set or sequence of related insights, graphically presented and, where 
necessary, commented with explanation. The initial experience of such multi- 
axial zoom environments is almost certain to be haphazard, even in the hands of 
an expert operator, and resistant to easy interpretation. Its navigation and the 
construction of meaning with it entails an iterative process of trial and error, of 
hypothesis formation and testing, and a responsiveness to unanticipated possibili- 
ties and unexpected insights. In terms of a design specification for a supportive 
graphical interface, it must encourage rather than inhibit speculative forays, with 
the operator confident of retaining or easily regaining their orientation. 

Where a user-authored sequence of states departs from an existing argument 
or narrative account, with which it shares certain states or state-pairs in com- 
mon - having started, by definition, with all in common - or where its structure 
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in development ex nihilo is algorithmically identified as correlating with signifi- 
cant states and state-pairs within such an existing account, the points and the ex- 
tent of agreement or divergence between the new and prior sequences may also 
be graphically indicated. It is quite possible that such relationships will be discov- 
ered between the current authored sequence and multiple analogous but some- 
what contesting accounts, each of which will offer its distinct interpretation, to 
produce further graphing of intersection and divergence. In such cases, the com- 
parison that is graphically indicated at any time will be between the active ac- 
count and a single correlated sequence, but with the option to cycle through the 
other sequences, for paired comparison. By these means, a dialogic approach to 
negotiating authority is encouraged whilst the legibility of difference is also en- 
sured, by dint of managing its parameters. The sense of empowerment experi- 
enced by the explorer in turn prompts them to contribute additional evidential 
data in support of their narrative or argument. 

The comparison of arguments and narrative accounts requires, of course, a 
state machine - actively recording states of layout preference and data configura- 
tion - in order to ensure that the branching paths of experimentation can be fol- 
lowed, backtracked, retraced and deviated from, in an arborescence of growing 
complexity but also redundancy. Ideally, this state machine management would 
include some means to mark those paths most favored, as the argument that they 
represent becomes more established, whether by authorial annotation or fre- 
quency of iteration: tracks through a network of states and edges that the inter- 
face may inflect with visual prominence, as in the erosion of desire lines in 
physical landscapes. 


11 Variable authority as a further axis of zoom 


To complement our three core axes of zoom, a fourth and distinct type is implied by 
this capacity for anticipatory probing of the data: one that runs from looseness to 
certainty, or from a dispersed to a hierarchical authority. The axis of zoom is, in this 
case, manifest within the graphical environment as a scalar of inclusiveness that 
can be adjusted — or, in more tangible User Experience terms, ‘brushed’ - to pro- 
duce either a more generously broad, or a narrower and more tightly defined focali- 
zation of the interpreted graph. This conceptual focalization — which may also be 
considered as the amplitude allowed by the chosen depth of focus - is technically 
realized as the degrees of node-traversal in the knowledge graph, filtered for avail- 
ability by their confidence scores, that that are enabled during the resultant query. 
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To this end, the (metaphorical) tuning-dial is calibrated to winnow data ac- 
cording to their provenance metadata, against two measures. The first is by type 
and level of authority, conferred by externally validated status. This might be var- 
iously by reference to a relevant publication with a Unique Identifier that has 
been validated by an authority source, or else by reputation acquired among the 
contributing community, for example by the collective validation of contribu- 
tions, of which scored records are maintained internally. The second is conferred 
by the reliability of the interpretation of the relationship between data repre- 
sented by that edge: calculated as a combination of the confidence score as as- 
signed by contributors and/or as probabilistically determined in the machine- 
learning process. 

Whilst an aggregation or disaggregation of nodes occurs during a zoom-in- 
resolution, a zoom-in-certainty is manifest simply as the addition or removal of 
nodes, as they fall within or without the acceptable range. The node and edges con- 
stellated in the micro-graph freeze and preserve user-focused interest at moments 
of transition between layouts and modes: concepts which are themselves explicitly 
modelled within the knowledge graph. However, at any time it should be possible 
for the user to surface - with minimal active enquiry (one or two steps of interac- 
tion) - the specific provenance data pertinent to the set of nodes and edges cur- 
rently filtered, or to any individual node or edge visible, and to access this data 
either as an annotation or wherever possible, at one further remove, by direct link- 
age to source. Such information is then presented: displayed as text, perhaps, either 
in a floating pop-up window associated with graphic elements, or within a dedi- 
cated side bar window, or as speech, whether pre-recorded or synthesized using 
speech-to-text methods. 


12 Conclusion: Towards a narrative grammar 
of zoomable exploration 


Behind the framework for managing the fluid visualization of historical data in nar- 
rative form that is described in this chapter, lies a process grounded in the modelling 
and later querying and analysis of the knowledge graph. This involves, fundamen- 
tally, the translation of one graph form into another: from a knowledge graph for- 
malized by alignment with formal ontologies that carry rich implicit knowledge, into 
contextually determined sub-graphs that are extracted and made accessible to the 
data explorer according to the specific requirements of the moment. These sub- 
graphs constellate ego networks which are centered on the subject-in-transit, with 
various allowed affordances of graph traversal and with exploration possible to 
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varying degrees of amplitude. Neither a general nor particular definition of the algo- 
rithmic substrate of such a graph translation system, by which the constraints are 
dynamically applied, will be attempted here. It will develop through accretion, as a 
product of the repeated practice of expert exploration and enquiry: rehearsals of 
human expertise that are manifest as the authoring of narrative sequences out of 
visualization states. This contribution of candidate edges and the new micro-graphs 
that they link together will be amplified through the application of machine learning 
methods, trained on those human-produced accounts, whether narrative or argu- 
mentative in form. 

In conclusion, it is possible to note how this more sophisticated level of 
knowledge extraction and narrative modelling depends on abstraction in two 
axes, combined with concrete specificity in another. The abstraction takes the 
form of high-dimensionality vector spaces, on the one hand, representing the 
graph of semantic associations, in which a process of clustering by variable simi- 
larity or affinity of vector pairs — or, in more refined instances, of matrices - sig- 
nify potential relevance. Within the Zoomland graphical environment, these 
might be inspected for utility and interest through the application of unsuper- 
vised non-linear dimensionality-reduction algorithms, such as UMAP or T-SNE, 
and may even be human-labelled in situ to augment the explicit record of the 
knowledge graph. Meanwhile, and by extreme contrast, Concrete specificity is to 
be discovered in the character of those more visually immersive states, ranging 
up to the level of three-dimensional photorealism. As will be apparent, a crucial 
role is played in this by the second axis, of Abstraction-Concreteness. 

The interplay of these two axes may be effectively combined in narrative se- 
quences, with the modes and layouts of each state determined by the specific per- 
mutation by the axial configurations. To achieve the desirable coherence of 
narrative construction, however, will additionally require the application of a 
narrative grammar of state sequencing, one whose definition might draw on re- 
search such as that by Neil Cohn into the ‘structure and cognition of sequential 
images’, in which the ordering and interplay of knowledge faceting has the poten- 
tial to generate ‘third meanings’ as powerfully as more purely figurative imagery 
(Cohn 2013). These meanings may be apprehended analytically or in more purely 
affective terms, with the most skilled exploratory authors orchestrating their ar- 
rangements into compelling, informative and persuasive accounts. The antici- 
pated outcome of this next step into Zoomland will be exploratory adventures 
that may stand comparison with the best linear forms of narrative history. 

In the fullest realization of this half-prototyped medium, we can imagine Che- 
khov's moon seen through the eyes of an agent in the secret war of 1943 — a passen- 
ger in the rear cockpit of a Lysander aircraft, exfiltrated from a field in occupied 
France and huddled over a thermos of coffee laced with rum - as it reflects off the 
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silvery French waterways below: a living map by which the pilot is tracing a route 
to safety. That tangible journey, though, is also a trace through contextual knowl- 
edge: the spine of a story, dynamically rendered across the myriad synapses of a 
semantically modelled knowledge graph. 
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@ Capturing Discourse through the Digital 
Lens: Towards a Framework for 

the Analysis of Pro-democratic Discourse 
in the Weimar Republic 


Abstract: Scalable reading has become a pathbreaking approach for discourse 
studies in Digital History. While large-scale analysis broadens the examination of 
primary sources and explores discourse features independent from the historian’s 
experience, close inspection evaluates findings in light of the historical context. 
However, if we want to bring the best of both worlds together fruitfully, methods 
must be geared to the material, the discourses under scrutiny, their historical con- 
texts and our epistemic interests. This paper proposes a methodological framework 
of scalable reading, specifically for pro-democratic discourses in the Weimar Re- 
public. Accessing democratic thinking in Weimar Germany’s fragmented political 
culture has been a significant concern of historical research. This includes examin- 
ing newspapers, as they represent opinion-shaping mass media. However, histori- 
ans have not exhausted this potential, and they have hardly engaged with scalable 
reading. This paper aims to close this gap by outlining the primarily heuristic bene- 
fits of scalable reading for studying democratic discourses in Weimar’s press. 


Keywords: scalable reading, discourse analysis, heuristics, Weimar Republic, 
democracy 


1 Zooming in, zooming out: Extending the 
toolset for examining the political culture 
of Weimar Germany 


1.1 Research on discourses as scalable reading 


The digital humanities have invented an array of large-scale text analysis techni- 
ques, which make it possible to access extensive document collections that scholars 
could only partially read manually. While these quantitative techniques have suc- 
cessfully been applied in various research domains, such as historical discourse 
analysis, many scholars rightfully warn that analysis results still need contextuali- 
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zation and interpretation. As Silke Schwandt put it in a nutshell, quantification re- 
sults are not per se meaningful, numbers are not the same as representativity, and 
analysis visualization, too, requires interpretation (Schwandt 2016). What is needed 
are approaches to fruitfully combine the computer’s potential of gathering statisti- 
cal information on the material’s contents and the scholar’s experience to contextu- 
alize and interpret that information. This challenge has been addressed for some 
time with respect to concepts of “scalable reading”! or “blended reading” (Stulpe 
and Lemke 2016). These umbrella terms stress the metaphor of zooming in and out 
on document collections. However, only concrete methodologies aligned to specific 
research questions and objects clarify how the zooming movements may and 
should work. In research on the History of Concepts, for instance, we might want 
to detect all the occurrences of a specific term and have a closer look at them. Dis- 
course studies, in contrast, typically depend less on word occurrences. Zooming 
into text passages of a particular discourse does not (solely) require finding key- 
words but also identifying discourse contents independent from specific terms 
(Oberbichler and Pfanzelter 2022: 136-137). What we need is to achieve a solid un- 
derstanding of scalable reading methods as defined areas of application that dem- 
onstrate the potential and limits of those instrumental means. 

In this chapter, I draw from existing projects of digitally-asissted discourse 
analysis and extend its methods in order to substantiate scalable reading for his- 
torical discourse analysis. In my view, scalable reading approaches are of great 
value for studying discourse because they substantially support the common chal- 
lenge to examine complex networks of semiotic practices while working with 
many primary sources. Regardless of any specific theoretical and methodological 
underpinning or definition of discourse, the quest is, broadly speaking, to identify 
meaning. Meaning is expressed by historical actors and attributed to different (so- 
cial, political, cultural) phenomena. Historians impose epistemic questions and 
perspectives on these phenomena, thus also inscribing meaning into their re- 
search objects. Therefore, discourse is a complex phenomenon that often requires 
looking over a vast number of primary sources, which underscores the impor- 
tance of heuristics — the historian’s traditional task of identifying, selecting and 
gathering material relevant to a specific epistemic interest. 

Sarah Oberbichler and Eva Pfanzelter (2022) reasoned about the potential and 
challenges of digitally assisted discourse analysis with a focus on heuristics. To do 
so, they discussed remigration discourses in modern and contemporary Austrian 
history, traced by historical newspaper analyses. Oberbichler and Pfanzelter argue 


1 Martin Mueller, “Scalable Reading,” Scalable Reading (Weblog), accessed May 2, 2022, https:// 
scalablereading.northwestern.edu/?page_id=22. 
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that searching digital newspaper archives, compiling a corpus of relevant texts, 
and exploring and interpreting these texts comes with extra challenges. For in- 
stance, simple keyword searches and frequency analyses cannot trace the dis- 
courses of interest sufficiently because discourses tend to be independent from 
concrete wording, as mentioned above. In promoting digital source criticism and 
methodological reflection, both authors propose combining different means like 
absolute and relative frequency analyses and text mining techniques. The latter 
can find new keywords and statements that the historian might not have thought 
of before. In total, Oberbichler and Pfanzelter provide valuable methodological 
insights into how to (1) compile a corpus of relevant primary sources, (2) en- 
hance the overview and orientation for exploring the corpus, and (3) *dig deeper 
into the historical-critical method in the digital age" (2022: 127). This innovative 
understanding of digitally assisted heuristics expands the toolset for discourse 
research because it complements the historian's experience-based searches with 
techniques *making the search less influenced by the researcher's prior knowl- 
edge" (2022: 147)? In doing so, Oberbichler and Pfanzelter are aware that any 
specific design of heuristic methodology depends on the contents and nature of 
discourses at hand, the discourse arena, and, ultimately, the research interest. 
When dealing with predominantly emotionalized language, for instance, senti- 
ment analysis techniques become more relevant than for discourses of a rather 
pragmatic and rational language use. 

My attempt at scalable reading follows a similar approach to Oberbichler's and 
Pfanzelter's, geared to the specific case study of the Weimar Republic. Germany's 
first democracy was highly contested in many respects (Büttner 2008: 729). In a 
fragmented and polarized landscape of political discourse between the two World 
Wars, much uncertainty existed about fundamental concepts of German society. 
For instance, stakeholders of different political orientations battled over the defini- 
tion of *democracy." Drawing from the existing research on Weimar's political cul- 
ture? and its engagement in discourse studies, I consider this case highly relevant 
to approach a methodology of scalable reading: Weimar's intricate discursive land- 
scape has forced many historians to downsize their research scope to specific 
groups, local contexts, discourse topics, or smaller collections of primary sources. I 
intend to show that scalable reading promises to broaden the scope and include 
more discourse contributions to deepen our understanding of political thinking. I 
focus on pro-democratic statements in newspapers, especially by defenders of the 


2 Here, the authors specifically refer to text mining methods. 
3 Historical research on political culture focuses on modes and contents of perception and the 
constitution of meaning by historical actors in political contexts (Hardtwig 2005a). 


46 —— Christian Wachter 


republic reacting to far-right attacks on Weimar’s democratic system and democ- 
racy as such. By tracking these statements, their connections, and proliferation, 
scalable reading techniques, as I discuss them in this chapter substantially support 
historical heuristics. This potential is even enhanced when scholars visualize their 
heuristic findings in a well-structured overview for subsequent interpretation. Con- 
sequently, these methods must be chosen and geared to the underlying research 
context. Instead of reasoning about scalable reading *as such," I try to demonstrate 
its benefits for a defined area of historical discourse studies that, at the same time, 
can serve as a springboard for similar research endeavors. 

Therefore, I aim at contributing to a genuine digital history methodology. Simul- 
taneously, this chapter contributes to the research on the Weimar Republic, which 
has hardly seen studies that employ a scalable reading approach on discourses. In 
conducting pilot studies, I intend to get insights into Weimar's democracy discourses, 
on the one hand. I also aim for receiving feedback for adjusting the methodological 
framework. Ultimately, my approach is primarily thought to enhance the toolset for 
examining political discourses to understand better the ideas and opinions on de- 
mocracy during an essential period of Germany's history of democracy. 


1.2 Getting a grip on the complexity of discourse 


Discourse is a phenomenon frequently described by spatial and pattern-based 
metaphors: People have ideas and perceptions of reality, and they utter them in a 
discursive space, a specific cultural, social, political, etc. arena of sense-making. 
Fueled by such contributions, discourses often overlap, for instance, when criti- 
cism of governmental decisions goes hand in hand with general demands for 
more public participation in politics. Discourse participants affirm or object to 
each other, forming a network of discursive negotiations. Some comments have a 
larger impact than others or might even be hegemonial. Following statements 
then replicate or build on the original message — a discursive line emerges. 

To be sure, there is much more to say about discourse, its competing definitions, 
or analytical approaches. What the metaphors above already reveal, however, is 
that discourse analysis deals with a complex phenomenon consisting of many con- 
stituents. Researchers must detect and interrelate them to gain knowledge from 
their investigation. And that is, above all, learning about the discursive creation of 
meaning — meaning that shapes reality, in the Foucauldian sense: It makes a differ- 
ence to call an anti-governmental uprising a “freedom movement” or “insurrec- 
tion." If one of these topoi becomes dominant for large parts of society, it is not just 
a difference of personal opinion. Instead, it is a difference in perceived reality — a 
reality that influences subsequent political judgments, power relations and actions. 
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Getting a grip on such a complex research object often demands access or 
selection from large collections of primary sources, regardless of any analog or 
digital methodology. In detective work, one must identify essential stakeholders, 
prominent discursive subjects, and the proliferation of topoi in society. While 
quantitative research answers this challenge by accessing material masses, quali- 
tative studies carefully find cross-sections for downsampling. It might seem that 
digital humanities methods facilitate mainly quantitative approaches, given that 
enough machine-readable material is available. This is because distant reading 
techniques allow the inspection of massive amounts of text. For example, topic 
modeling traces potentially meaningful clusters of words as they occur in large 
corpora. These corpora are meaningful because they comprise documents se- 
lected by relevance for specific research interests. On this basis, topic modeling 
statistically captures terms with patterns of co-occurring words hinting at candi- 
dates for relevant subject matters. This can give insights that are not possible to 
achieve without the computer. However, DH scholars regularly warn that the 
term “topic modeling” is misleading because the computer does not model topics 
in a narrow sense, but instead statistically identifies word groupings that might 
signalize topics. Scholars still must interpret the analysis results. As Amelie Kutter 
(2017) pointed out more generally, large-scale analyses have their limits and their 
promises can be all too tempting. She argues that corpus analysis does not help 
us much to reveal the (social, political, etc.) context that is decisive for the mean- 
ing of a statement. The occurrence of specific terms or concrete phrasing often 
does not reflect the underlying discourse, which is the real object of scrutiny. 
Dodging formulations, indirect references, coded language, neologisms, irony, etc. 
are to be mentioned here. These challenges add up to obstacles like misspellings, 
idioms or abbreviations, which affect the word level of keyword searches and 
can be tackled by the use of controlled vocabulary (Blair and Carlson 2008). More- 
over, discourse analysis is usually interested in what has not been uttered at all 
and why this is the case (Kutter 2017: 172). The absence of a particular phrasing or 
the neglect of a specific topic might point to different things, for instance, censor- 
ship. Additionally, the political climate could be heated to a degree so that politi- 
cal stakeholders strategically refrain from stating claims that are, in fact, part of 
their convictions. In her study on the normalization of contemporary far-right 
discourse, Ruth Wodak points out that anti-immigrant statements operate on the 
verge of the sayable: Ambivalent messages “require great efforts in terms of argu- 
mentation and legitimation strategies, which always have to accommodate the 
routinely sayable and unsayable in a specific context” (Wodak 2021: 58). Identify- 
ing what was sayable and unsayable in a given context bears important informa- 
tion on the nature of particular discourses, political and social developments, also 
in historical research (Steinmetz 1993). 
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To be sure, topic modeling, word embeddings or other digital approaches and 
tools do address these challenges. Textometry* and SCoT,’ for instance, are special- 
ized in comparing texts and corpora to trace the (changing) meaning of terms. They 
also spot absent or underrepresented words and phrases. This can be utilized to com- 
pare discourse contents and style. As another example, DiaCollo targets diachronic 
collocation analysis: Users may explore the surrounding wording of a defined signal 
word and compare such findings between corpora to identify word meaning shifts 
over time. The collocations may point to different thematic contexts in which a signal 
word was treated. They may reveal that the word under scrutiny was consistently 
uttered in statements of emotional language. Another finding could be that the fo- 
cused term received changing meaning, observable for specific periods. Word em- 
beddings have been utilized similarly (Hengchen 2021). Such techniques bear great 
potential, especially for discourses circling around particular names. 

In her study on Irish collective identity construction and nationalism Maélle 
Le Roux examined the Irish periodical Capuchin Annual from 1930 to 1977, with a 
case study focusing on representations of 17th-century English statesman Oliver 
Cromwell as an object of projection for anti-English and pro-Irish sentiment. To 
do so, Le Roux analyzed articulated references to Cromwell but also their ab- 
sence, “as an absence is a representation in itself” (Le Roux 2021: 49). Irish iden- 
tity construction is thus traced by occurrences of the name Cromwell and other 
signal words, by co-occurrences of neighboring terms, and concordances that re- 
veal characterizations of Cromwell. Complementarily, Le Roux spotted missing 
occurrences and descriptions in articles of different issues, years and authors. 
The statistical results were inspected in close reading and interpreted with regard 
to the historical context, combining approaches of history of representations, Crit- 
ical Discourse Analysis (CDA), and corpus linguistics. In doing so, Le Roux gained 
a better understanding of the construction of collective Irish identity because the 
research design successfully revealed the contexts and connotations of (missing) 
Cromwell representations. However, her example also points to the limits of anal- 
yses that focus on specific words and phrases. This is because scholars often do 
not know (yet) what particular phrasing to look for, or the wording is not at all 
consistent for a given discourse. Topic models and co-occurring adjectives may 
lead to some patterns of how historical actors are characterized in nationalist 
statements. These adjectives might be tested on their co-occurrence with further 


4 “Textométrie,” TXM, accessed October 5, 2022, https://txm.gitpages.huma-num.fr/textometrie/. 

5 “SCoT: Sense Clustering over Time: a tool for the analysis of lexical change,” ACL Anthology, 
accessed October 5, 2022, http://dx.doi.org/10.18653/v1/2021.eacl-demos.23. 

6 “DiaCollo: Kollokationsanalyse in diachroner Perspektive,” CLARIN-D, accessed October 5, 
2022, https://www.clarin-d.net/de/kollokationsanalyse-in-diachroner-perspektive. 
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words for exploring more nationalist formulations. But what if we do not have 
enough of such initial stepstones and miss key signal words? In this context, Le 
Roux herself underscores the challenges of identifying paraphrases used in the 
examined texts (Le Roux 2021: 36). 

Analyzing and clarifying ambivalent language use, as I address in this section, 
still requires a great deal of close reading in the initial stages of analysis. This begs 
the question of the role of qualitative and quantitative approaches in a discourse 
research design. Should the investigation be primarily qualitative, with large-scale 
analyses complementarily exploring terms and phrases that a historian has not 
thought of? Are quantitative approaches, in that sense, of auxiliary use in methodo- 
logical triangulation? Or do they build the fundament of the analyses? The answer 
surely depends on the applied definition of discourse and the concrete research in- 
terest. As I want to substantiate in the following sections, I follow the first option. 
Keyword or phrase searches can, in my opinion, only provide a rough entry 
point for spotting, collecting and interrelating primary sources for assessing pro- 
democratic discourses in the Weimar Republic. At its core, the heuristic methodol- 
ogy must respect the discourses' pronounced independence from specific word use. 
Therefore, scholars may manually choose cross-sections of material, for instance, 
newspaper issues published right after political events that impacted political dis- 
course. This way, the scholar's experience and intuition compensate for what 
word-based or phrase-based analyses miss. This observation is in sync with Kutter 
when she argues that corpus analysis is no appropriate replacement for thorough 
interpretation *[p]recisely because of its selective focus on the distributional prop- 
erties of words" (Kutter 2017: 184). Instead, corpus analysis is understood as an 
*explorative technique for heuristic and reflexive purposes" (Kutter 2017: 170). 


1.3 Discourse analysis and scalable reading 


Following such warnings, the purpose of large-scale analyses should be conceptu- 
alized to identify conspicuous spots and patterns that are worth being consulted 
for closer inspection. This makes nuanced concepts of scalable reading salient. 
Martin Mueller emphasizes the notion of *digitally assisted text analysis", while 
the operative word is *assisted."" In that sense, literary studies profit from search- 
ing and identifying textual features like grammatical patterns (zooming out) as a 


7 Martin Mueller, *Morgenstern's Spectacles or the Importance of Not-Reading," Scalable Read- 
ing (Weblog), accessed May 2, 2022, https://scalablereading.northwestern.edu/2013/01/21/morgen 
sterns-spectacles-or-the-importance-of-not-reading/. 
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basis for close inspection and interpretation (zooming in). Similarly, Alexander 
Stulpe and Matthias Lemke understand “blended reading” as a framework to ac- 
cess social reality. The two authors consider the large-scale perspective of text- 
mining congruent to sociology of knowledge approaches (Stulpe and Lemke 2016: 
28-30). This is because both create a distance to the research objects. Therefore, the 
distant reading part would not just provide a pre-structuring of data for heuristics, 
but it would also bring fourth analytical insights. Stulpe and Lemke see this poten- 
tial for analyzing semantics and discourses alike. They regard close reading as a 
means of quality check, looking for any contradictions between the results of dis- 
tant reading and hermeneutic examination (Stulpe and Lemke 2016: 55). 

From a theory/philosophy of science point of view, such contributions offer in- 
novative perspectives on methodology for the digital humanities in general and dis- 
course analysis in particular. They do so by converging different traditions and 
cultures of research for a genuine methodology of digital humanities research - be- 
yond any mere adaptation of methods that have been developed in the computer 
or data sciences. Such contributions foster self-reflection and methodological depth 
in DH research, but they are largely absent, particularly regarding quantitative dig- 
ital methods. Taking this aspiration and the mentioned notions of scalable/blended 
reading seriously, distant reading does not add to or ‘enrich’ close reading. Instead, 
it is about a genuinely complementary relationship of epistemic importance that 
accommodates the need for an “update of hermeneutics" that Andreas Fickers de- 
mands for digital history: Historians must face the task of critical reflection of 
search algorithms, digitized sources, digital tools and interfaces. Without “thinking 
in algorithms" [my translation], research would be in danger of losing evidence 
and transparency when engaged with digital sources (Fickers 2020b: 167). This is 
because data and tool literacy should be considered necessary and logical exten- 
sions of traditional core components of historical research. These are, in particular: 


1) Heuristics 

In the sense of Johann Gustav Droysen: *Heuristics gather all the material we 
need for historical examination; heuristics resembles the art of mining, to find 
and to bring to light" (Droysen 1977: 400) [my translation]. This fundamental task 
for every historical research has always been laborious. Historians must often 
probe into vast amounts of primary sources, scattered in various archives to find 


8 Cf. Michael Piotrowski and Fafinski Mateusz, *Nothing New Under the Sun? Computational Hu- 
manities and the Methodology of History" 173-177 (paper presented at CHR 2020: Workshop on 
Computational Humanities Research, Amsterdam, November 18—20, 2020), accessed October 6, 
2022, http://ceur-ws.org/Vol-2723/short16.pdf. 
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anything relevant to their specific inquiry. The critical challenge is to gain an 
overview, orientate and collect significant material. As for digitized or born- 
digital sources, information retrieval and text mining techniques analyze massive 
amounts of data under predefined parameters. These techniques assist manual 
and qualitative searches when metadata provide well-structured information on 
the sources; this may be information on the document’s creator, time and location 
of creation. A summary of the contents makes document filtering highly flexible 
and fast. Whatever the details of such digitally assisted heuristics may look like, 
the task still requires a great deal of attention, for the search parameters must be 
aligned to the sought material (which full-text keywords can be expected in a per- 
sonal file, aprogress report on a building construction, or in parliamentary proto- 
cols?). The parameters must also harmonize with the type of metadata provided 
by the repository. Beyond that, different tools enable different searches, for in- 
stance, by employing a specific query language. All this demands consideration to 
prevent poor or biased results. Digitally-assisted heuristics is intense work. How- 
ever, flexible and well-structured searches through vast amounts of material help 
battle the traditionally challenging demands of heuristics. 

A subsequent task of heuristics is organizing the collected sources in a way 
that supports their systematic interpretation. Historians must store the material 
alongside commentary notes in a structured fashion, best according to a data 
management plan and in a database. This enables them to keep track of the mate- 
rial’s relevance for different aspects of a research project. Why has it been col- 
lected, in the first place? Why is it interesting? Often, a document must be 
reconsulted to discuss it in a new context that has arisen while analyzing other 
material. Here, we deal with challenges for orientation again, for which metadata 
of the above-mentioned kind provide structured information. 

As I will argue in the next sections, manual annotations expand the potential 
of digitally assisted heuristics. When investigating intricate objects like discourses, 
historians annotate the topics and contents of the collected source material. In 
doing so, they enrich the metadata that can be retrieved in later searches. In doing 
so, they create semantic relations between the primary sources when these sources 
share a subject matter or discursive topoi. Linking sources in this way is powerful 
because it grants easy and quick access whenever historians must orientate and 
(re)consult documents. 


2) Source criticism 

Outer source criticism inspects how, why, and under which circumstances source 
material has been created and passed on to the present. For digitized or born- 
digital material, there are extra challenges to be considered. For instance, histor- 
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ians must be aware of file formats because they precondition which analytic tools 
to select and how to utilize them. Additionally, not every existent source has been 
digitized, so the work on digital resources may become too selective. 

Inner source criticism traditionally deals with the contents of source material 
and questions on what information can be gained. For digital sources, there are 
also metadata and its schemata to be considered. They predefine how historians 
may employ software tools for analysis, and they impact the results of such exam- 
ination. Aspects like these have raised the awareness of specified digital source 
criticism (Föhr 2019; Fridlund 2020; Hering 2014; Pfanzelter 2015)? 


3) Interpretation 

The quest to find in-depth insights brings us back to the critical role of context, as 
Kutter addressed it. As contextualization is paramount already for source criticism, 
its importance increases in interpretation. For historical scholarship (beyond edit- 
ing or any sort of basic research), identifying linguistic properties, patterns or 
even trends of word use represents valuable raw material’ for scrutiny. However, 
it does not represent any significant gain in knowledge. It is the interpretation of 
such results - what they mean when we make sense of the past. For historians, 
interpreting events in the light of preceding and succeeding events (diachronic 
contextualization) is as much important for this task as respecting synchronous po- 
litical, social or cultural contexts. Zooming into the results of corpus analysis for 
interpretation might subsequently be the departure for new digital analyses. This 
is because thorough reading and insights might raise new questions on the horizon 
of the given research project. Further search terms become relevant and new re- 
sources must enter the corpus. Therefore, the macro-perspective of zooming out 
and the micro-perspective of zooming in are not to be applied in a strict consecu- 
tive order. Instead, it is a repetitive process until no further loop seems worth- 
while. In a sense, this is a digitally enriched version of the hermeneutic circle — the 
iterative and deepening attempt of approaching a work's meaning through thor- 
ough perception, accumulated context information and interpretation. 

All these issues of heuristics, source criticism and interpretation demonstrate 
that digital techniques contribute to the *array of methods and the toolbox histor- 
ians have at their disposal" (Lassig 2021: 6), to perform nothing less than the disci- 
pline's core tasks. Digital methods open up new possibilities of mastering these 
tasks, but they also impose new challenges in terms of technological skills, on the 
one hand, and critical reflection of the expanded methodology, on the other 
hand. James E. Dobson addressed challenges like these and criticized that DH re- 


9 “Living Handbook ‘Digital Source Criticism” ATLAS.ti, accessed May 10, 2022, https://atlasti.com. 
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search does not sufficiently reflect on the epistemic dimensions of digital meth- 
ods. Dobson particularly emphasized diachronic contexts that researchers would 
seldomly consider when analyzing data. He also urges digital humanists to better 
understand technical steps when applying digital techniques. Dobson's focus may 
be narrowed by his emphasis on quantitative methods, his critique of alleged 
structuralist and formalist assumptions in the DH, and his far-reaching disregard 
for research outside North America.'? He is, however, right in reminding us that 
the “incorporation of digital methods into humanities research requires more 
methodological awareness and self-critique" (Dobson 2019: 6). 

I would like to argue that one way to tackle this task is to develop methodologi- 
cal frameworks for specific research domains, for instance, analysis of the Weimar 
Republic's political discourses. Such frameworks outline epistemic interests, theoret- 
ical implications, the material to be analyzed and analytic procedures. They then re- 
flect on how digital — in conjunction with analog — methods accommodate this kind 
of research. Methodological frameworks are broader than concrete workflows, for 
which they serve as a fundament. They need adaptation for specific research proj- 
ects and unique research questions. Thus, despite their conceptual elaboration, 
methodological frameworks are, to a certain extent, eclectic models and work-in- 
progress. On the other hand, such frameworks are more concrete than generic re- 
flections on distant or scalable reading per se or on techniques like topic modeling. 
This is because they stress the instrumental function of digital methods for a defined 
research area, tailoring these methods to the needs of that research area. At the 
same time, they make purposeful application easier. 


1.4 Capturing political discourse in the Weimar 
Republic - towards a heuristic framework 


This paper outlines first thoughts on a methodological framework based on the 
considerations above. As a conceptual proposal, I intend to reflect on primarily 
qualitative analyses of political discourse in the Weimar Republic. To be more 
precise, the framework aims at identifying statements that countered anti-liberal 
and anti-republican discourse by Germany's far-right. It focuses on keyword 
searches performed on newspaper repositories and manual selections of newspa- 
per articles. This combinatory approach is meant to crystalize a selection of rele- 


10 Cf. Evelyn Gius, *Digital Humanities as a Critical Project: The Importance and Some Problems 
of a Literary Criticism Perspective on Computational Approaches," review of Critical Digital Hu- 
manities: The Search for a Methodology, by James E. Dobson, JLTonline, January 24, 2020, 11-12, 
accessed June 28, 2023, urn:nbn:de:0222-004298. 
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vant text material for the analysis of democracy discourses — a selection to build 
a digital and structured text corpus. Manual annotations enrich the corpus with 
information on discourses, structuring and relating the texts for qualitative analy- 
sis. All this aims at making pro-democratic attitudes articulated in the vast and 
complex landscape of Weimar Germany's newspapers more accessible than be- 
fore for historical interpretation. 

Research on Weimar's political discourses has shown that right-wing rhetoric 
vilified the political system as “western” and deeply “non-German,” thus employ- 
ing a culture-based language. Criticizing Germany's first democracy was part of 
an identity agenda, advocating for the strict rule of a leader and a strictly hierar- 
chical order as political and social alternatives to the status quo. While several 
historical studies have addressed such anti-republican and anti-liberal state- 
ments, the defenders of Weimar Germany have received less attention in dis- 
course history. Therefore, I focus on discourses that pick up or criticize the far- 
right rhetoric to get a clearer picture of one strand of pro-democratic discourses 
in the Weimar Republic. The material base for that are newspapers as integral 
parts of a highly polarized and fragmented landscape of harsh political discourse. 
Weimar's newspapers formed an important arena for expressing and consuming 
political ideas. 

While newspapers of the Weimar era have already been examined primarily 
in regional discourse studies, my proposal for a methodological framework has a 
broader scope. The approach does not favor or exclude any newspapers. How- 
ever, it must be considered that gazettes of the Weimar era have only partially 
been preserved and much less digitized. The digitized papers mostly have huge 
gaps between the years and issues. One could argue that de facto we must limit 
the scope to regional or other contexts to make justifiable selections of papers apt 
to answer relevant research questions. I agree from the perspective of empirical 
research. Notwithstanding, I would object that methodological proposals of a 
broader scope still are worthwhile in terms of what Fickers calls “thinkering:” the 
combination of experimenting with methods (*tinkering") with theoretical reflec- 
tion (“thinking”) on this practice (Fickers 2020a). I would argue that “thinkering” 
methodological frameworks are even more justified in the face of ongoing digiti- 
zation of historic press media, as it is happening in many countries with great 
effort. In the German context, the recently founded Deutsches Zeitungsportal 
stands out. As a central portal for historic German newspapers, it brings together 
digitized collections of myriad archives and libraries. On the one hand, the ever- 


11 "Deutsches Zeitungsportal," Deutsche Digitale Bibliothek, accessed October 10, 2022, https:// 
www.deutsche-digitale-bibliothek.de/newspaper. 
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growing availability of digitized newspapers and increasing interest in digital 
press analysis justifies the development of frameworks in good time to make use 
of the available resources." On the other hand, timely developed frameworks can 
directly be applied to analyze new resources as they become digitized. 

Having said this, any methodological framework should, indeed, be tested on 
relevant digital resources and appropriate use cases. Their analysis delivers em- 
piric insights that should always go hand in hand with theoretical reflection. A 
promising and feasible example is the analysis of social democrat discourse. So- 
cial democrats stood at the forefront of Weimar's democracy and its defense. 
They used its Berlin-based party organ Vorwärts as a mass medium of political 
discourse. The Friedrich Ebert Foundation has digitized every issue from 1876 to 
1933.? The resources are available with OCR in Deutsches Zeitungsportal. Due to 
its completeness, Vorwärts is a good material base for identifying and tracking 
social democratic discourse over time. From the discourse research point of view, 
one might object that this focus is one-sided and material-driven. Indeed, Vor- 
wärts is just an individual newspaper, and social democratic debates also hap- 
pened elsewhere. Narrowing the analysis in that way cuts connections to the 
broader discursive space and blurs overlapping discourses. Furthermore, projects 
that concentrate on “the digitally available" may give the impression of comfort- 
able enterprises, ignoring too many not (yet) digitized resources. However, since I 
make a methodological proposal sketching a conceptual framework of how to 
conduct discourse analysis with digital techniques, this objection does not apply. 
The framework is flexible enough to cover other newspapers, and even other 
types of writing. It also suggests how to manually digitize and integrate newspa- 
per articles as qualitative selections from archival records. 

I agree with Kutter's understanding cited above that large-scale analysis is of 
explorative and heuristic value. Zooming out provides us with a rough overview, 
and it hints at promising constituents of discourse, not necessarily expected there 
but awaiting close inspection by zooming in. I intend to sketch how applying digi- 
tal techniques can reach that goal. In doing so, I broadly adopt the methodological 
framework that Sarah Oberbichler (2020) developed to analyze anti-migrant dis- 
course in South Tyrol's contemporary history. Oberbichler convincingly demon- 
strated how to grasp discourse in newspaper corpus analysis, mainly using the 
tool Atlas.ti'^ for investigation. The framework I propose orients broadly at Ober- 
bichler's research design but makes adjustments to take Weimar's complex and 


12 As the most recent contribution to this research vein see Bunout et al. 2022. 

13 “Digitalisierungsprojekt ‘Vorwärts bis 1933," Friedrich-Ebert-Stiftung, accessed October 7, 
2022, https://www.fes.de/bibliothek/vorwaerts-blog/vorwaerts-digitalisierung. 

14 “ATLAS.ti,” ATLAS.ti, accessed May 10, 2022, https://atlasti.com. 
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polarized discursive landscape into due consideration. In contrast to Oberbichler, 
I take CATMAP as the central tool of choice. Like Atlas.ti CATMA is primarily de- 
signed for qualitative text annotation and analysis, though the functionalities of 
both tools also support quantitative research. They facilitate collaborative work- 
flows of text annotation or individual annotation to categorize text parts in pri- 
mary sources and attribute information to them. This enriches the material 
semantically, and it creates relations between text passages. Scholars may explore 
the annotations by customized search parameters and analyze the results with a 
set of built-in visualization features. CATMA has the benefit of being a free-to-use 
tool that brings all the features needed to qualitatively analyze and visualize dis- 
course data. Furthermore, it provides extended means to evaluate annotations by 
the programming language Python: GitMA"® is a Python package utilizing the dis- 
tributed version control Git to flexibly access, process, analyze and manipulate 
annotations. As another contrast to Oberbichler, I reference Critical Discourse 
Studies (CDS) to develop my discourse analytic perspective. This socio-linguistic 
field focuses on social relations of power and is well suited to analyze the use of 
language of culture. More precisely, I follow the Discourse-Historical Approach 
(DHA,, for it focuses, among other things, on qualitative analyses complemented 
by quantitative methods such as text linguistic techniques in methodological tri- 
angulation. It also takes longer time spans under scrutiny, (Reisigl and Wodak 
2016) which fits well to trace the evolution of democracy discourses along the 
course of Weimar's eventful history. Oberbichler instead chose another branch of 
discourse analysis that focuses on argumentation strategies and patterns, namely 
the Düsseldorf School of discourse analysis as it developed from the work of Martin 
Wengeler (2003). The theoretical perspective of the DHA steers the *digital lens" to 
parts of the corpus that are to be examined. It plays, therefore, a fundamental role 
in the scalable reading framework. 

Taking up the metaphors of *zooming" and the *digital lens", I borrow from 
the vocabulary of movie production to make the conceptual implications of the 
framework clearer: The first section of my paper outlines a “screenplay” that 
serves as a fundament for the heuristic framework. It engages with the state of 
research and, on this basis, formulates an epistemic interest that any (discourse) 
study, ultimately, must formulate. Here is also the place to give remarks on the 
intended *camera perspective," meaning the DHA viewpoint that is to be applied 
to the investigation of the digital resources. Using the language of movie produc- 


15 “CATMA,” CATMA, accessed May 10, 2022, https://catma.de/. 
16 “GitMA,” CATMA, accessed October 6, 2022, https://catma.de/documentation/access-your-proj 
ect-data/git-access/gitma/. 
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tion in this way simply underscores that my framework employs perspectivity. 
Weimar’s political discourses form a specific research domain and the DHA im- 
poses concrete concepts and epistemic interests. All this necessarily affects the 
methods I outline. I do not present a “generic” or “neutral” framework for general 
discourse analysis, because digital and analogue methods always have an instru- 
mental purpose in the context of a specific research interest and, therefore, are 
set up by choices and adjustments. Against this background, I consider the “cam- 
era” the better metaphor to describe the perspectivity of digitally assisted meth- 
ods than the popular ”microscope” or ”telescope.” Discourse in the Weimar 
Republic is not observed, it is instead “captured.” 

The second section is dedicated to the basic structuring of data (Stulpe and 
Lemke 2016: 43-45; Oberbirchler 2020: 471, 474-476), which is necessary for cor- 
pus-building: In a first step, we would have to meet data management as an orga- 
nizational affordance for later data analysis and documentation. Next, we need to 
spot relevant source material, which ultimately depends on the imposed theoreti- 
cal perspective. This searching for material I metaphorically refer to as "location 
scouting." To fulfill this task, I propose manual selections as well as defining key- 
words that would presumably, but not necessarily, occur in the discourses of in- 
terest. Frequency analyses of such search terms yield a rough idea of where to 
find relevant text passages, beyond the manually selected material. The manual 
and analogous searches build the fundament at this heuristic stage, firstly be- 
cause the relevant newspapers and newspaper issues are just partially available 
as digital resources. Second, the phrasing in the newspaper articles might deviate 
from the keywords. In this context, not only does the term occurrences necessi- 
tate a closer look, but also the absence of occurrences might be interesting when 
we would expect a specific word used in a given context. Additionally, we should 
consider *negative keywords", meaning terms that we would hardly expect re- 
garding the convictions of pro-democrats because their political opponents usu- 
ally utter them. Hits in frequency analysis might surprise us or simply indicate 
where pro-democratic statements picked up the phrasing of political rivals for 
counter-statements. Both cases are informative for discourse analysis, and they 
are worthwhile to have a closer look. 

After that, the corpus can be compiled. To build thematic sub-corpora, dedi- 
cated to specific topoi of discourse (e.g., “western democracy") or thematic empha- 
sis (anti-republicanism in conjunction with antisemitism), the digital resources 
need qualitative annotation. This *pre-processing" is the procedural basis for 
deeper inquiry. Language use, both against the Weimar Republic and in defense of 
it, are to be visited, as the third section outlines. It is also revisited because the find- 
ings of prior research become questioned for identifying new discursive connec- 
tions. Or those findings are extended by analyzing newspaper articles that have 
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remained out of scope of traditional discourse analyses. During this heuristic 
framework step, historians must still find original gazettes in archives. In qualita- 
tive search, they may define cross-sections oriented at specific dates of political im- 
portance. The so retrieved material may then be digitized and processed using 
tools like Nopaque." 

The versatile “digital lens” makes it possible to quickly “pan” from one indi- 
vidual text passage to another or to thematically related material. This can be 
helpful for source criticism and interpretation whenever we “have a tilt” at a spe- 
cific text segment by close reading, and when we find new articles of other news- 
paper issues worthwhile consulting. This is the case, for instance, when we 
examine the treatment of a political event in politically different oriented news- 
papers or if we make diachronic comparisons. The latter is the case when com- 
Daring pro-republican statements of Vorwärts shortly after Weimar’s Constitution 
came into effect with statements of the same paper on the annual constitution's 
commemoration (Verfassungstag, “Day of the Constitution,” August 11, 1921 to 
1932). Complementarily, visualization of the distribution and semantic relations 
between the annotated material provides orientation for “panning” and for find- 
ing new interesting discursive connections. An interactive visualization opens up 
a “bird's-eye-view” for that and, at the same time, makes it possible to “zoom” 
into the “worm’s-eye-view”. 

All these steps are to support thorough discourse analysis according to CDS 
and the DHA, with the historian bringing in contextual considerations and critical 
interpretation, thus capturing discourse through the digital lens. 


2 Screenplay: Countering anti-democratic attacks 


2.1 Epistemic interest: Defining and defending democracy 
in the Weimar Republic 


The era of the Weimar Republic counts as one of the best-examined periods of 
German history, with early research focusing on the demise and failure of Ger- 
many’s first democracy. National Socialism served as the vanishing point for his- 
toriography, and this tendency developed at times when the Federal Republic of 
Germany engaged in democratic self-assurance after 1945. Weimar served largely 
as a negative contrast for Germany’s second parliamentary democracy (Ullrich 


17 “nopaque,” Bielefeld University, SFB 1288: Practices of Comparing, accessed May 10, 2022, 
https://nopaque.uni-bielefeld.de/. 
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2009: 616). Fritz René Allemann’s (1956) famous words “Bonn ist nicht Weimar“ 
(“Bonn is not Weimar”) represent the belief of many political observers in the old 
Federal Republic, following Dirk Schumann (2017: 102). At the same time, Alle- 
mann’s quote was a central reference for many subsequent studies on the history 
ofthe Weimar Republic. As Ursula Büttner argues, Zeitgeist and the development 
of Weimar research have always had a strong and clearly visible interdepen- 
dency (Büttner 2018: 19). 

After the First World War, Germany’s economic, political and social life was 
burdened by tremendous structural and event-driven problems, despite intermedi- 
ate tendencies of stabilization. Against this background, early historiography nour- 
ished the narrative of “crisis” for the young republic (Peukert 1987: 282). On the one 
hand, this does not surprise, given the radicalizing political and social development 
and the republic’s dramatic end. On the other hand, more recent positions have in- 
creasingly criticized the one-sidedness of that narrative (cf. esp. Föllmer and Graf 
2005). Around the new millennium, the Weimar era’s image has become one of an 
era of its own right, with historians emphasizing chances of consolidation and prog- 
ress. When “anti-democratic thinking in the Weimar Republic” (Sontheimer 1962) 
had been of pronounced interest before, now democratic forces, *democratic think- 
ing" (Guys 2000), and the multi-faceted *understanding of democracy* (Braune et al 
2022) gained more attention. This change began when Germany represented a 
grown-up and firm democracy, after the Cold War, and in a globalized world. Back 
then, the turbulent years between the World Wars seemed less fitting for political 
references (Schumann 2017: 102). Subsequently, recent research has taken up under- 
exposed aspects of Weimar's history, such as rural society, religious life, mass cul- 
ture, youth culture or international aspects of Weimar's interpretation of modernity 
(Rossol and Ziemann 2021).! Political culture with respect to democracy's chances 
has as much become a significant concern as the contingency of Weimar's fate (e.g. 
Canning et al. 2013; Hacke 2018; Hacke 2021; Hardtwig 2005b; Lehnert and Megerle 
1990; Schumann et al. 2021). 

Despite such reorientation, the “crisis” has not entirely vanished. If anything, 
we find the image of a contested democracy with chances and failure often close 
to each other, as Franka Maubach summarized it (Maubach 2018: 5). This opinion 
has become pertinent against the backdrop of contemporary political and social 
developments: Western democracies are facing massive attacks on their values 
and institutions. Those attacks primarily come from the far-right and challenge 
democratic culture, urging the respective societies to engage in self-defense and 


18 For an overview of former and recent tendencies in Weimar research see Kolb and Schu- 
mann 2022. 
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self-assurance. To do so, debates have frequently referred to warning examples 
of failed and fallen democracies of Europe’s 20th century. The Weimar Republic 
plays a particularly important role for that in Germany, catalyzed by its current 
centennial jubilee. As seismographic feedback on the relevance of these debates 
in society, historians have questioned if we really can observe “Weimarer Ver- 
haltnisse?” (“Conditions like in Weimar?”) (Wirsching et al. 2018) and stated that 
“Berlin ist nicht Weimar” (“Berlin is not Weimar”) (Schuhmann 2017). In such 
publications, often directed toward a larger audience, historians bring in their ex- 
pert knowledge and often warn that comparisons have their limits. On the one 
hand, they acknowledge certain structural similarities with, for example, aggres- 
sive right-wing attacks exploiting mass media to spread anti-governmental dis- 
course. On the other hand, they criticize anachronisms that disregard distinctive 
differences between Weimar’s and present Germany’s social conditions, demo- 
cratic systems and political cultures. 

This brief characterization of major research strands brings us back to the 
relevance of analyzing discourse on democracy in Weimar’s mass media. Bern- 
hard Fulda enriched the debate with his study on Berlin and its surroundings, 
while focusing on political newspapers, tabloids, and the local press of Berlin’s 
surrounding area (Fulda 2009). In doing so, Fulda showed that the major newspa- 
pers generally had little impact on voting decisions by the masses but noticeable 
impact on politicians as professional readers, political and parliamentary debates. 
Karl Christian Führer addressed similar questions for Hamburg, investigating po- 
litical effects of the press and anti-republican discourse on readers (Führer 2008). 
Local newspaper studies like these have significantly enriched our knowledge of 
Weimar’s political culture, as have other discourse analyses of specific topics and 
topoi such as “Volksgemeinschaft” (Wildt 2009) ("people's community,’ ‘folk com- 
munity,’ or ‘racial community’), or antisemitism in the Reichstag (Wein 2014). 
Further studies apply a selective focus on the early period of the Weimar Repub- 
lic (Kämpfer et al. 2014; Lobenstein-Reichmann 2014), or they concentrate on a po- 
litical camp such as leftist parties (Seidenglanz 2014). 

The framework I propose here is meant to be a methodological contribution to 
this area of research, not limited to any local or temporal context. It has, however, 


19 Proper translation depends on the speaker’s political standpoint and contextual use of the 
term. It was prominently — but not exclusively — used by Germany’s far-right. The idea of German 
unity in a “Volksgemeinschaft” had become popular since the First World War. For an introduction 
to the extensive research and academic debate on this term see Mergel 2005; Bajohr and Wildt 
2009; Wildt 2012; Michael Wildt, *"Volksgemeinschaft": Version 1.0," Docupedia-Zeitgeschichte, 2014, 
accessed October 13, 2022, https://doi.org/10.14765/zzf.dok.2.569.v1; Kershaw 2011; Schmiechen- 
Ackermann 2012; Uhl 2021. 
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a specific focus on discourse contents because any discourse analysis approach 
must live up to guiding epistemic interests and perspectivity. The proposed frame- 
work addresses the defenders and supporters of liberal representative democracy 
as it was institutionalized in Weimar's political system. Other concepts, such as the 
council republic favored by the far-left, are excluded. It is the supporters of the 
Weimar Republic who I have in mind and who deserve more attention by histori- 
ans. More precisely, I would like to promote a concept of discourse analysis that 
focuses on how defenders of liberal and representative democracy reacted to 
far-right rhetoric of *non-German" liberal democracy and culture-based anti- 
republican discourse. This approach could also focus on how far-left attacks were 
countered. For Weimar's discourse history this would make a lot of sense, since the 
political factions did not just attack opponents on the other side of the political 
spectrum. On the contrary, prior research revealed that groups on the same side of 
the spectrum were seen as competitors, and they harshly attacked each other. How- 
ever, I exclusively address pro-republican and far-right discourse in this paper, for 
this represents a clear dichotomy in terms of basic political convictions. Reactions 
and democratic counteroffers to right-wing discourse, therefore, stand at the center 
of the framework. Case studies with a specific regional or national scope may make 
use of it and adapt it. Such projects would shed light on the republic-friendly dis- 
course contributors' concrete understanding of democracy. What *democracy" 
meant and how it should be institutionalized was highly controversial across the 
political camps and even within social milieux of Weimar Germany. Thus, discourse 
analysis promises to enhance research on political culture by providing a sharper 
picture of political ideas in the Weimar Republic. 

In this context, Thorsten Eitz and Isabelle Engelhardt present a rich linguistic 
analysis of discourses (Eitz and Engelhardt 2015), including a chapter by Eitz on 
the disputed form of government (Eitz 2015). Here, Eitz presents detailed results 
from his extensive newspapers inquiry, with particular regard to the political 
press. He carved out the polysemic use of the terms *democracy" and *republic." 
In doing so, Eitz identified significant *flag words," which the political camps 
used for their agendas, not least for the republic's defense or attacks on it. The 
results, however, rather represent a linguistic account of discourse properties, 
lacking extensive historical interpretation. This might not have been Eitz's goal. 
However, according to historical discourse analysis, one would expect to learn 
more about contextualization of the examined utterances, discursive relations be- 
tween them, and pronounced historical interpretation of the overall results. 
Thomas Mergel demonstrated that for the terms “Führer,” “Volksgemeinschaft,” 
and *Maschine" (Mergel 2005). Mergel convincingly argued for this selection by 
pointing out that the three terms counted as important for various political 
camps. While intensive use of the terms does not imply the same meaning for all 
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discourse participants of the political spectrum, their use nevertheless reveals a 
set of political expectations and hopes behind the utterances. Mergel interpreted 
the (different) usages as signifiers for shared topics, ways of speaking and a 
shared perception of political reality. Against this backdrop, “Volksgemeinschaft,” 
for instance, cannot count as a right-wing term per se. Instead, it was a projection 
surface. The political right utilized it to sell their strictly hierarchical and racial 
interpretation of the term. As another interpretation, pro-democratic political dis- 
course accentuated the ideal for the parliamentary system to represent the 
“Volksgemeinschaft” with all its social diversity — an ideal that had not yet been 
fulfilled in the eyes of many discourse participants. 

To conclude, the framework I present in this paper is a first methodological 
approach sketching heuristic means to render detailed discourse analysis possi- 
ble. It focuses on defenders of liberal democracy against the far-right. In this way, 
I intend to contribute to the methodology of a prolific research strand that faces 
much uncharted terrain. Digital analysis techniques, on the other side, have 
hardly been utilized for the discourse history or the history of newspaper dis- 
courses of the Weimar Republic.” Therefore, I argue that scalable reading has a 
lot to offer for the heuristic exploration and innovative inquiry of political dis- 
courses in Weimar Germany. 


2.2 Camera perspective: A theoretical viewpoint from Critical 
Discourse Studies (CDS) 


CDS or Critical Discourse Analysis (CDA) is a research area that Kieran O'Halloran 
defined as *a branch of linguistics that is concerned, broadly speaking, with 
highlighting the traces of cultural and ideological meaning in spoken and written 
texts" (2003: 1). CDS falls into a multitude of approaches with different methods 
and research programs. Eclectically drawing on a range of theoretical traditions 
(cf. Forchtner and Wodak 2017), the underlying goal of all approaches is “to under- 
stand the complex workings of language within society, a concern for how socio- 


20 Giulia De Paduanis‘ Master thesis on the Aachener Anzeiger is an exception. De Paduanis fo- 
cuses on language changes over time, and how to interpret them in the context of political and 
societal discourses. The epistemic interest of the study is to deliver historical insights in the face 
of contemporary challenges for democracies. De Paduanis analyses a sample of one newspaper 
issue per month for the Weimar era by applying a scalable reading approach with Voyant Tools. 
Giulia de Paduanis, “Learning from the Past: The Case of the Weimar Republic: A Proposal for 
Historical Analysis, Revision and Digitization" (Master thesis, Department of Cultural Sciences, 
Linnaeus University, 22.01.2023). 
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cultural structures influence and, at the same time, are influenced by, language 
use" (Forchtner and Wodak 2017: 135; cf. Fairclough and Wodak 1997: 258). Wodak 
specified that *CDA highlights the substantively linguistic and discursive nature of 
social relations of power in contemporary societies. This is partly the matter of how 
power relations are exercised and negotiated in discourse" (Wodak 1996: 18 [em- 
phasis in the original text]). Against this background, the relationship between dis- 
course and power is situated on several "levels," as Bernhard Forchtner and 
Wodak pointed out: “Yet, approaches generally view power as being present “in dis- 
course’ (some positions will hold greater potential to influence others), ‘over dis- 
course’ (for example, the question of access and agenda setting), ‘and of discourse’ 
(an understanding of power which points to latent conflicts [. . .]” (Forchtner and 
Wodak 2017: 135). The emphasis on culture-based language use and power makes 
CDS instructive for analyzing discourse in political culture. While CDS has broadly 
been applied to the analysis of contemporary discourses, it can also be used for 
historical discourse analysis (e.g., Richardson 2017). 

More precisely, I follow a definition of discourse that Martin Reisigl and 
Wodak formulated for the Discourse-Historical Approach (DHA) of CDS. The two 
authors regard *discourse" as: 

*a cluster of context-dependent semiotic practices that are situated within 

specific fields of social action; 

- socially constituted and socially constitutive; 

- related to a macro-topic; 

- linked to argumentation about validity claims, such as truth and normative 
validity involving several social actors with different points of view." (Reisigl 

and Wodak 2016: 27) 


This definition is fitting for the examination of Weimar's political discourse land- 
scape. While Reisigl and Wodak highlight argumentation in the last step, it is 
noteworthy that hereby also different modes of language use are addressed. For 
instance, one might find arguments that have an ideological tone, trying to justify 
why democracy would be *non-German." Or the tone is more pragmatic, empha- 
sizing that the democratic state brings political participation to the people. 

For the heuristic framework as presented in this paper, pro-democratic state- 
ments are understood as a means to (re)gain power within the polarized discur- 
sive landscape of Weimar's contested democracy. The tone of this landscape was 
more than controversial; it was oftentimes harsh. Hateful and defaming attacks 


21 Forchtner and Wodak draw from Steven Lukes's *three-dimensional view of power" (Lukes 
2005: 25-29). 
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were the usual. The far-right employed a language of culture and identity to dis- 
credit the republic as “foreign,” “western” or “French.” In racial terms, it was fre- 
quently characterized as “Jewish.” The goal was to stigmatize the political system, 
to move the limits of the sayable in the political culture of the Weimar Republic. 
This shift of the sayable should allow for new radical political changes and acts 
that would, ultimately, get rid of the hated liberal democracy, which describes the 
power dimension in far-right discourse. Having said this, pro-democratic oppo- 
nents should not be considered entirely defensive in their efforts to expose, counter 
and substitute anti-democratic discourse. They took an active part in shaping Ger- 
many’s democratic culture - in coining what “democracy” should mean for post- 
war Germany. In that sense, pro-democratic statements, too, are to be regarded as 
a means of power within the discursive battles of Weimar political culture. 


3 Setting up the digital lens: Heuristics 
and data pre-structuring 


3.1 Data management plan 


Digital discourse analysis becomes more accessible and more structured when data 
and metadata are stored and documented in an organized way, best utilizing ver- 
sioning means such as Git. At the same time, data management lays the fundament 
for transparent data publishing, thus facilitating reuse and critical assessment by 
other scholars. Finding an appropriate repository is another key component of 
data management and reuse. All in all, this stage of the methodological framework 
must accommodate to the fundamental FAIR principles: Findability, Accessibility, 
Interoperability, and Reuse of digital (meta-data. According to the serial charac- 
ter of newspapers and their local, regional, or national distribution, the primary 
data naming parameter should be date, accompanied by location. 


3.2 Identifying far-right discourse and keywords 


Research on Weimar's political culture has produced much knowledge about far- 
right discourse. The well-examined account of anti-republican topics and topoi re- 


22 “The FAIR Data Principles," FORCE11, accessed May 19, 2022, https://force11.org/info/the-fair- 
data-principles/. 
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veals a culture-based language use that features distinct keywords: “Führer” 
(leader, as a political title), “Volksgemeinschaft,” “System,” “Organismus,” “Liber- 
alismus,” “Parlamantarismus,” “Demokratie,” “neue Freiheit” (‘new freedom’), 
“neue Politik” (‘new politics’), “Judenrepublik” (‘republic of the Jews,’ Jewish re- 
public’), to mention just a few. These keywords serve the identification and analy- 
sis of discourses that pick up and counter far-right attacks. At the same time, the 
list should include terms that previous research has emphasized for genuine pro- 
republican and democratic language use, such as Eitz’s “flag words.” Such terms 
do not just mark concepts of democracy and the republic, but they function as 
counter-concepts to the political opponents’ thinking. This relation must always 
be considered for the overall discourse on contested democracy. Relevant are, 
among others: “Demokratisierung” (‘democratization’), “Sozialismus” or the di- 
chotomic figure “Demokratie oder Diktatur” (‘democracy or dictatorship’). 

Later frequency analysis will utilize these terms and their grammatical varia- 
tions, enabling a first glance into the discourses of interest. They will have to be 
followed by close reading of manually selected articles, as outlined in the follow- 
ing sections. This is because paying attention to the utterance of the keywords 
alone ignores text passages of altering phrasing that nevertheless are relevant in 
terms of their discursive contents. Still, frequency analysis provides a first rough 
overview and, simultaneously, gives an idea of where else to look. 

The keywords defined in this heuristic step should be organized in a data- 
base. They represent an existing vocabulary, carved out by prior research, and 
utilized for newspaper analysis. Christian Schneijderberg, Oliver Wieczorek, and 
Isabel Steinhardt referred to such course of action as deductive approaches (both 
quantitative and qualitative), whereas inductive approaches try to find the ana- 
lytical categories in the material (Schneijderberg et al. 2022). The goal of my 
framework is to combine the deductive and inductive. The latter is the case when 
exploration and close analysis reveal new topics and keywords. They must enter 
the database and new frequency analyses, which renders possible new insights 
into the nature of political discourses. Ultimately, this step marks the beginning 
of an iterative looping through the texts until no further loop appears necessary. 
This exploratory approach checks for more keywords, more discursive topoi and 
topics than previous research has addressed to this day. Moreover, it is an at- 
tempt to gain a more detailed image of the discourses, given the enhanced capaci- 
ties of the computer to (1) quickly search through myriad texts, (2) let historians 
flexibly jump from one passage to another, and (3) rapidly revisit text that be- 
comes interesting again in the light of later examined further text passages. These 
are, fundamentally, heuristic benefits of the digital. 
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3.3 Corpus compilation 


A corpus comprises relevant material for in-depth discourse analysis, compiled 
after the findings from the prior step. Therefore, corpus compilation is a critical 
stage of filtering and gathering source material, thus fulfilling core demands of 
heuristics, as Ihave characterized them above. 

The corpus should be coherent and fitting to the project’s research question. 
Suppose we want to analyze pro-republican discourse over time by the social 
democratic organ Vorwärts. In that case, we might integrate the complete collec- 
tion of issues for the Weimar era, since the paper is wholly digitized. This would 
allow for identifying significant articles and statements, even for dates and con- 
texts that one might not have anticipated. 

However, most other newspapers have only fragmentarily been digitized. 
This makes it challenging to conduct cross-newspaper analyses solely on digital 
collections. Historians still must confront newspaper articles in archives and 
manually digitize them when engaging in scalable reading. While this would be 
nearly impossible for quantitative analysis, given the vast amounts of relevant 
issues scattered over various archives, the task is more feasible for qualitative se- 
lections. Historians would have to create cross-sections, choosing material from 
specific dates and focusing on influential newspapers. These selections might con- 
centrate on critical political events, such as the assassinations of Germany’s for- 
mer secretary of the treasury, Matthias Erzberger, and foreign minister Walther 
Rathenau. Qualitative selections could also focus on the passing of essential acts, 
international treaties or the Verfassungstag (Day of the Constitution’), Germany’s 
national holiday from 1921 to 1932. On these occasions, the political discourse 
lived up, often flamed up, contesting Weimar’s political system in fundamental 
debates. On the one hand, manual article digitization demands considerable 
extra effort. On the other hand, this challenge is outweighed, to a certain extent, 
by gaining flexible searchability within the collected material for later analysis. 
Scholars benefit from this structured accessibility by receiving more orientation 
when comparing different text parts and relating them to each other. They thus 
increase the heuristic value of the corpus. 

Tools for manual digitization have become user-friendly, even for those who 
are not tech-savvy. Nopaque, for instance, combines file setup, OCR (even HTR by 
the Transkribus? pipeline), NLP, and corpus analysis in an easy-to-use toolchain. 


23 "Transkribus," READ COOP, accessed May 10, 2022, https://readcoop.eu/transkribus/?sc= 
Transkribus. 
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This makes the, still laborious, task of manual digitization and data processing 
better manageable. 


3.4 Keyword frequency analysis 


CATMA counts frequencies of the keywords and represents them in a distribution 
chart. Additionally, the tool counts all the newspapers that contain the keywords. 
These first statistical results provide a rough overview of occurrences and tempo- 
ral distribution of utterances. They are anchors for zooming into the search hits 
for close reading. 


3.5 Identifying discourse topics by close reading 


After frequency analysis, the found text passages need thorough inspection to 
evaluate the hits on keywords that have been defined in step 3.2. This procedure 
determines which text parts really address the topics and topoi of interest and 
which are false hits. Complementarily, new relevant discourse topics and con- 
tents might become apparent. They must be documented and enter another itera- 
tion of frequency analysis and close reading. 

Other digital methods enrich the so far conducted heuristic and pre- 
structuring steps. For instance, co-occurrence analysis and topic modeling iden- 
tify anti-democratic terms that pro-democrats have picked up to counter them. 
For example, the Social Democrat Hermann Wendel reacted to the far-right topoi 
of “Judenstaat” (‘Jewish state’) and “Judenkanzler” (‘chancellor of the Jews’) in an 
article of the Social Democratic newspaper Vorwärts in 1929 (Wendel 1929). Here, 
Wendel exposes such rhetoric as arbitrary and contradictive by showing that 
even national heroes like the old empire’s chancellor Otto von Bismarck had been 
defamed in that way. We find co-occurrences in the Article hinting at citations 
that associate “Bismarck” as an *Abkómmling von Juden und Kramern” (‘Descen- 
dent from Jews and grocers’. Here, ‘grocer’ is an anti-capitalist pejorative). This 
way, Wendel employs a strategy of mocking and delegitimizing far-right attacks 
on the Weimar Republic that operate with the same antisemitic rhetoric. Digital 
techniques that analyze the surrounding phrasing of a term or expression can 
help identify such patterns. They may also track down synonymic usages of dif- 
ferent words, revealing semantic networks in far-right and pro-democratic vocab- 
ularies. They also help identify distinctive connotations of a single term, as used 
in specific contexts. 
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3.6 Compilation of sub-corpora 


The information gained in the prior steps serves the definition of more specified 
keywords for the explored discourse topics and topoi. These keywords help compile 
thematic sub-corpora for specific discourse topoi or strands that refer to, for exam- 
ple, antisemitic attacks on liberal democracy. Another sub-corpus might collect 
sources that defend parliamentarism. Defining such specified sub-corpora in- 
creases the visibility and findability for the texts, in order to facilitate later qualita- 
tive analyses. This is because sub-corpora support contextualization of statements. 

The specified keywords are used in searching the whole corpus. Every text 
passage that returns a hit receives a respective annotation in CATMA. While 
doing so, the passages should be read carefully to define new relevant keywords. 
They enter search runs on the whole corpus to replenish the annotations. The 
process loops until no more keywords are identified, and no new hits appear for 
the source texts. 


4 Zooming and panning: (Re)visiting text 
passages for new insights 


4.1 Discourse analysis: Examining and annotating 
the resources 


The above steps of heuristic assessment and pre-structuring are followed by her- 
meneutic analysis of the annotated text passages. With the DHA as the guiding 
perspective for that, the focus lies on conceptions of democracy that oppose cul- 
ture-based attacks on the republic (i.e., *the system of Weimar is a non-German 
institution"). These statements should be examined with regard to their temporal, 
local, political, and socio-cultural context. All matching passages of the whole cor- 
pus should get an annotation for the corresponding democracy concept. Project 
teams profit from CATMA's undogmatic capacities of collaborative annotation to 
find “gold standard" annotations. 

Whether by teamwork or individual efforts — all annotations should not de- 
pend on the exact wording of the statements. Instead, the DHA aims at identifying 
relevant semantic contents. This is the primary task of hermeneutic interpreta- 
tion at this stage of analysis. And this means that manual choices of text passages 
complement the keyword-based approach. As outlined above, qualitative cross- 
sections help identify significant articles that do not feature any anticipated 
phrasing, which keyword searches necessarily miss. 
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Annotation of the resources' formal aspects is also relevant. These are, for in- 
stance, date of publishing, type of text (e.g., article, reader's letters), etc. Annotating 
and thereby documenting these features helps differentiate between the source 
types in further interpretations or when revisiting text passages becomes necessary. 


4.2 Structuring the annotated text passages 


The next step connects the instances of pro-democratic topoi by utilizing CATMA's 
query feature. It picks out every relevant text passage and displays the different 
topoi and their semantic relationships (e.g., when pro-democrats pick up the far- 
right statements "liberal democracy is alien" and "Jews control the republic"). 
CATMA visualizes the results as a word cloud or a distribution chart combined 
with Keyword-In-Context, thus providing a structured overview. Users can click 
on its elements and explore the annotated text passages in their original contexts. 
Users might also test different parameters for the display, such as specific sub- 
corpora or types of text, to have a more precise view. In total, the outlined fea- 
tures are genuinely powerful in structuring the representation of the corpus and, 
ultimately, heuristically supporting text interpretation. 


4.3 Source criticism and interpretation 


After annotating and structuring the texts, they are ready for thorough source criti- 
cism and interpretation. The visualizations help quickly zoom out from an individ- 
ual statement to the context of the whole source text. It also helps to jump to other 
semantically related newspaper articles for criticism and interpretation, keeping 
track of the manifold facets and contributors of pro-democratic discourse and con- 
texts. One statement can be interpreted in the light of another, and differently 
dated utterances can quickly be compared in diachronic inquiry. Regional specifics, 
too, may be considered by selecting only respectively annotated newspapers. Revis- 
iting text resources becomes relatively easy when new insights require repeated 
examination. The digitally implemented heuristics of this framework thus support 
context sensitivity and in-depth insight. 
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5 Conclusion 


Scale and zooming are metaphors that scholars of digital humanities and history 
use for diverse parts of research: Data processing, analysis, knowledge representa- 
tion, methodological documentation, and more. In terms of text analysis, “scalable 
reading” or “blended reading” stands for innovative approaches to combining 
large-scale examination and in-depth inquiry. As much as this general theoretical 
concept might sound convincing, the actual potential of scalable methods still man- 
ifests itself in research with a defined theoretical and methodological orientation 
and epistemic interest. Only instrumental use of scalable reading techniques can 
prove the benefits of “zooming in and out.” This is to say that scaling techniques — 
as any technique — are not per se productive but can only be fruitful for what they 
are employed for. 

This chapter attempted to bridge the level of general reflections on scale and 
the level of specific research projects that apply scale. On a mesosphere, I out- 
lined a methodological framework that is not intended as a blueprint to be strictly 
followed. Instead, it sketches the heuristic fundament for explorative analyses of 
pro-democratic discourse in the Weimar Republic. This framework surely needs 
refinement once empirical analyses address demands for applied methods. How- 
ever, it is my conviction that frameworks are productive tools when they are 
based on epistemic objects (here: historical discourses), address epistemic inter- 
ests (how did pro-democrats counter anti-democrats?), and put theoretical and 
methodological programs (the DHA) into practice. If digital analytic tools provide 
a "lens" for research, this lens must be set up and directed at objects of interest. 
The framework I outlined in this chapter is a proposal to do so. 

It is largely based on the instructive approaches that Oberbichler and Pfanzel- 
ter developed to analyze anti-migrant discourses in contemporary history, but it 
has several modifications. I agree with those commentators on close and distant 
reading that see in large-scale techniques a primarily heuristic value. This starts 
with finding relevant primary sources and ends with flexible possibilities to visit 
and revisit text passages of a corpus, supporting not just quantitative analysis but 
also qualitative inquiry. This is because we often become interested in repeated 
reading of texts when new insights bring up new aspects of the examined topic. Or 
even further research questions may arise. For the analysis of pro-democratic dis- 
course, this might mean that statements of older newspapers become more impor- 
tant when diachronic comparisons to later articles reveal that the early texts 
anticipated topics and topoi that are particularly relevant years later. One might 
say that this is perfectly possible with pure close reading. But given the intricate 
nature of discourse and the complex interconnections between many discourses, 
we profit a lot from the digital heuristic support, not least for applying the herme- 
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neutic circle. As a result, the heuristic framework outlined in this chapter is meant to 
gain a clearer and deeper picture of pro-democratic discourse in Weimar Germany. 
Beyond that, the results provide transparent demonstration when highlighting the 
scaling steps, providing an overview by visualization, and publishing research data. 
This may take shape as a multimodal publication for enhanced transparency and 
reproducibility.“ 
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Abstract: This article looks at the juxtapositions of Italian microhistory and digi- 
tal spatial history. While microhistory and digital history are in opposition to one 
another in terms of scale, they have similar aims, particularly a commitment to a 
methodology whose partial purpose is overcoming the silences of the subaltern 
and underrepresented in the archive. Digital history can help microhistorians 
find the exceptional normal in a cache of documents, follow clues, and illuminate 
the mentalities, particularly the spatial ones, of their subjects. 


Keywords: microhistory, digital history, spatial history, GIS 


In 1562, in Modena, Italy, Sister Lucia Pioppi wrote in her diary the following verses: 


Do not grieve uncle, so many friends and family remain 

that will punish this senseless malice 

and will take vengeance so fatal that the infinite abyss will be amazed. 
(Bussi 1982: 42) 


When [I read these lines for the first time, the incongruity of such vicious words 
in an early modern nun’s diary was striking and uncanny. The rest of the diary is 
quotidian enough. She wrote of visitors to the convent parlors, larger geopolitical 
events, earthquakes, deaths, marriages, births and other news from relatives; the 
contrast between the furnishings of daily life and the revenge poem made these 
lines even more striking. Who were these friends and family? What was the 
senseless malice? These questions and how a nun came to speak of vendetta and 
revenge deserved an answer. 

After following the trail of clues from this poem in Sister Lucia’s diary, I dis- 
covered a complex story of fractious nuns, murders in churches, jurists breaking 
the law, and a bloody vendetta between two noble families, the Bellincini and the 
Fontana. To quote Carlo Ginzburg in the preface to The Cheese and the Worms: 
The Cosmos of a Sixteenth-century Miller, “As frequently happens, this research 
too, came about by chance.” (Ginzburg 1980: xv) Digging through the letters of the 
Modena’s governors to the Duke of Ferrara, chronicles, and family papers looking 
for those named in Sister Lucia's poem, I discovered that these families were in a 
vendetta for over a century. There was abundant documentation on this vendetta, 
thousands, and thousands of pages worth, and even more valuable still—first- 


8 Open Access. © 2024 the author(s), published by De Gruyter. LGS This work is licensed under the 
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hand accounts; eyewitness reports of vendettas, despite the ubiquity of the prac- 
tice in sixteenth-century Italy, are rare. 

It was also by accident that I discovered that the Bellincini and Fontana were 
neighbors. While taking an introductory GIS for the Humanities course, I was 
looking for data to practice with and found the data for the Bellincini and Fon- 
tana assassinations on my thumb drive. After a quick google search, I was able to 
find a sixteenth-century map which identified historic buildings, including family 
palazzi and churches. After plugging in the information and tracking down the 
coordinates on the geo-rectified map, I realized what I previously had not been 
able to put together—the warring factions lived around the corner from one an- 
other. Even though the factions would have encountered one another daily, no 
assassinations or fights had taken place on the streets they lived on. This chal- 
lenged my assumption that vendetta fights were spur of the moment, opportunis- 
tic encounters in the streets. Indeed, after I mapped the murders, I realized they 
were in public places—mostly piazza and churches—and had been carefully 
planned for months or years. By using the methods of microhistory and digital 
spatial history, I learned about vendetta. 

As a microhistorian in the field of Italian history and a digital historian pri- 
marily working on spatial history, I experience these two methodologies as mutu- 
ally enriching. While following the trail of a normative exception in the archive, I 
have often turned to digital history to map spaces, visualize patterns, and contex- 
tualize events. Such affordances, in turn, offer fresh paths of inquiry that must be 
pursued by more traditionally textured and confined investigation; the macro 
sometimes leads down avenues that can only be traversed by a turn to the micro. 
Nor is my experience particularly unusual; productive interactions between digi- 
tal history and microhistory are becoming more common. How this synergy 
might develop remains to be seen, particularly because of the key role digital his- 
tory currently occupies. As a methodological subdiscipline, digital history is a 
marked growth area and a driving force for change. If dissertation and hiring 
trends are a bellwether of methodological trends in the field, then narrative mi- 
crohistories are likely to push towards topics on a macro-scale-the Atlantic world, 
the global, the transnational. Whether digital projects will be equally productive 
of microhistorical research is, as yet, unclear. 

Microhistory, for its part, retains a steady role but not necessarily a leading 
one. Prominent journals and academic presses regularly publish microhistories of 
the working classes, persons of color, colonial subjects and LGBTQA+ persons. This 
methodology in particular, and a disciplinary commitment to narrative in general, 
continue to excite scholars, inform graduate training and provide new insights. 
However, examining the fifty or so most well-known microhistory monographs 
written by Anglophone scholars in the last five decades, one would be hard pressed 
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to find junior scholars’ books among them; most were written by tenured faculty at 
elite institutions. And while microhistory has benefitted from the proliferation of 
digital archives and access to new materials, it is apparent that early career schol- 
ars are not encouraged to write microhistories in the Anglophone world. They are, 
however, encouraged and incentivized to do digital work. The proliferation of 
source materials, digitized books and archival material at our fingertips, and the 
increasing pressure to do comparative work at scale, seem to have resulted thus far 
in a privileging of synthesis over narrative. However, these conditions may eventu- 
ally fuel new, more traditionally narrative work at all career stages. And if so, the 
particular strengths of microhistory will continue to matter. Indeed, in the writing 
of histories of subaltern subjects, for instance, the responsible and nuanced care of 
microhistorical methods will likely be crucial. Even as digital tools and methods un- 
earth a host of new potential stories, dealing with them responsibly will require 
telling them with texture and depth. 

Such present and potential realities should not, of course, obscure an essen- 
tial and mutually enriching compatibility that is already evident. While microhis- 
tory and digital history are in opposition to one another in terms of scale, they 
have similar aims, particularly a commitment to a methodology whose partial 
purpose is overcoming the silences of the subaltern and underrepresented in the 
archive. Many digital projects also have the aim to inject agency into the subjec- 
tive. Digital spatial history, particularly geospatial history using methods like GIS 
and deep mapping, can help us get at particular modes of agency in ways that 
were more difficult before, fulfilling some of the original priorities of microhis- 
tory in unexpected ways. 


1 A macro-history of digital history 


Before the development of what is known as global microhistory, the geographic 
scale of microhistory was typically narrower in scope. In the past twenty years, 
digital methods have dramatically expanded the possible scale of historical re- 
search in ways that have transformed the profession, but not microhistory per se. 
Nonetheless, digital methods present distinct new affordances, and challenges, to 
microhistorical practice moving forward. It is also the case that digital history 
could greatly benefit from microhistory, particularly as it comes to a focus on 
agency and narrative. This essay explores the twists and turns of these developing 
and potential relations. 

The potentials worth exploring here emerge from the dizzying multiplication 
of computer-based tools, web archives and digitized materials that have trans- 
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formed the practice of history in the past twenty years. As Laura Putnam pointed 
out in her article on scale in history, “technology has exploded the scope and speed 
of discovery for historians” in ways that have had profound impact on our research 
(Putnam 2016: 377). Not so long ago, when most historical research meant traveling 
to an archive, sometimes at great expense of time and resources, calling up infor- 
mation from typed inventories or card catalogs, requesting specific materials, and 
transcribing data from them by hand, the collection of evidence was necessarily 
limited by the nature of the research process. Now, when a researcher need only 
travel as far as their favorite chair while poring over fifteenth-century Veronese 
wills, digitized copies of seventeenth-century plays, or medical remedies from sev- 
enteenth-century Japan, the very notion of limits works much differently. Even 
when traveling to archives is still necessary, technology eases the process; phone 
cameras with the capacity to take professional-quality pictures of documents, then 
store them in the cloud, offer greatly enhanced levels of data gathering. Likewise, 
digitized archival materials are becoming available online at an accelerated rate. 

If collecting materials has allowed research to scale up, tools have likewise 
allowed us to speed up in a range of ways. The process of organizing and then 
analyzing documents has, by degrees, also become much easier. Programs like 
Tropy and Omeka allow for document organization and the creation of metadata 
for digitized collections both large and small. OCR and programs like Transkribus 
are moving closer to automated transcription of even the most difficult handwrit- 
ing.’ Interfaces like Voyant allow us to perform keyword searches, analyze prox- 
imity and terms, and trace when and where a word is used.” If I want to research 
cultural perceptions of dueling in the seventeenth century, a quick Google Books 
search turns up the entry on dueling in a seventeenth-century French/Italian dic- 
tionary, numerous Italian treatises on dueling, an eighteenth-century musical, 
some poetry, sermons, and accounts of historic duels, all within the course of a 
few minutes-generating more material than can be analyzed without computa- 
tional methods in any one lifetime. In short, the digital revolution has widened 
the research pipeline exponentially. 

Not only has the computer changed the way we research primary sources, but 
the online availability of monographs, articles and teaching materials has also 
changed the practice of historiography and reading in the field. Journals are avail- 
able online and easily searchable; some are freely available and open access, as are 
the articles of this collected volume. Monographs, likewise, are easily accessible in 
electronic copies. Digital book repositories like Cambridge Core, ACLS e-books on- 


1 Readcoop, accessed June 28, 2023, https://readcoop.eu/transkribus/. 
2 Voyant, accessed June 28, 2023, https://voyant-tools.org/. 
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line, and Perlego have made entire libraries available at considerably reduced ex- 
pense. Scholars can upload articles and presentations to Researchgate, academia. 
edu and Humanities Commons. A quick JSTOR search on the history of prostitution 
can turn up hundreds of articles; fifty years of scholarship on the Italian Renais- 
sance resides on ACLS e-books online and Cambridge Core. The net effect is that 
decades of research are easily available at our fingertips, frequently obviating the 
need for a trip to a physical library, a search through bound book indexes or a 
glance at printed bibliographies. 

In the context of an ongoing, or potentially reinvigorated, interest in more 
conventional historical methods, these developments are not wholly unproblem- 
atic. In her seminal article “The Emmett's Inch-Small History in a Digital Age,” 
Julia Laite articulates the ambivalence of digitization: 


The boundlessness of the past has always been kept in check not only by the boundedness of 
the archive and library but also by our own cognitive and physical abilities to identify, search, 
collect, and connect records. This, argues Rigney, is where history meets the sublime—where 
historians admit the limitations of their ability to know, comprehend, and represent a bound- 
less past. So, what happens now that our ability to chase so many people out of the bounded 
archive has become so much greater, faster, and finer grained? (Laite 2020: 968) 


As Laite points out, context can be key for understanding materials. Returning to 
our examples above, does it make a difference if one can see the folder which 
contains the Veronese wills? Does it make a difference which collection they are 
housed in? Will seeing and touching the scroll for the seventeenth-century Japa- 
nese remedy for headaches make a difference to interpretation? Will touring the 
royal theater that premiered the 1780 play about dueling make a difference in 
our interpretation of it? As critics have pointed out, sometimes digital work elides 
context in ways that can make historical work more shallow.? 

Other drawbacks to the scale afforded by digitization can be seen in the con- 
straints that search algorithms place on research. At once they enable broader 
searches while shaping them in ways that the user may not intend. Google Books 
search is dependent on algorithms that index imperfectly. Article databases re- 
quire searches to be precise at the metadata level in order to be effective. Online 
digitized collections still require the same labor as the in-person archive and in 
some cases more, depending upon organizational structures and the availability 
of a search function. And the sheer number of materials to work with can create 
a sense of vertigo more severe, in many cases, than that prompted by the physical 


3 See, Mykola Makhortykh, Aleksandra Urman, and Roberto Ulloa. *Hey, Google, is it What the 
Holocaust Looked like?" First Monday 26.10 (2021), accessed June 28, 2023, https://firstmonday.org/ 
ojs/index.php/fm/article/view/11562. 
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process of exploring a material archive, however large its footprint or extensive 
its holdings. At the biggest risk of loss, perhaps, is the tangible serendipity of the 
archive and the library, the mental catalogs of collections in the minds of archiv- 
ists, and the expertise of years spent sifting through dusty and crumbling docu- 
ments. Computers are not yet able to suggest with any precision a collection that 
one may want to consult if one is interested in the history of prostitution in six- 
teenth-century Venice. Folders upon folders of inquisition cases have yet to be 
consulted. 

Methodologically speaking, microhistory would naturally seem opposed to 
the scaled-up practices afforded by digitization and digital methods. One relies on 
metadata and computer-aided searches, the other on serendipitous archival finds. 
Unsurprisingly, when asked about digital humanities in a recent interview, the 
microhistorian Carlo Ginzburg had the following to say: 


As for big data, are they able to detect anomalies? Or do they tendentially erase them? I 
have actually debated this topic in a written dialogue with Franco Moretti. The danger of 
working with big data lies in the erasure of anomalies because they focus on convergences 
and norms. (Dayeh 2022: 218) 


In Ginzburg's view, it is hard to find Menocchio in a spreadsheet. 

Many microhistorians who are highly vocal about the debates in their own 
field, however, have not weighed in at length on the digital revolution. Even 
though they have much to contribute to the debates about digital history, they 
have been mostly silent observers, aside from a few notable recent examples that 
will be discussed later (Trivellato 2015: 122). The silence is particularly notable 
when remembering the sometimes vociferous critics of cliometrics and social sci- 
ence methodology. Indeed, it was these developments in historical methodology 
that inspired microhistorians to take a different scope. Despite their sometimes 
resemblance to cliometrics, digital methods also have a great deal to offer the mi- 
crohistorian. New digital tools and methods allow us to trace these clues in new 
and novel ways. The digitization of little-studied, hard to find records, manu- 
scripts, early printed works, and archival collections has made uncovering con- 
text and tracing clues easier. Tim Hitchcock argues that digitization has allowed 
us to ‘radically contextualize’ by hunting down clues in physical archives and on- 
line archives (Hitchcock 2013). Does digitization give us access to hundreds if not 
thousands of Menocchios making the normative exception, in fact, normal? 

Indeed, digital history may be helping to bring about a resurgence in micro- 
history. Whether motivated by fear of losing sight of the historical subject entirely 
in a sea of numbers, online archives and maps, or desires to carve narrative out 
of electronic frontiers, the number of microhistories or reflections thereof ap- 
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pears to be growing and not decreasing in number.’ As Thomas Cohen mused in 
a recent article commemorating the fortieth anniversary of the publication of 
Carlo Ginzburg’s The Cheese and the Worms, “microhistory is alive and kicking; it 
still intrigues writers, beguiles readers, and charms abundant students” (Cohen 
2017: 53). Cohen’s observation seems to hold true; as already noted, microhistories 
are still being published both in monograph and article format along with robust 
discussions of methodology. 

In fact, looking at recent titles and monographs, one could have the impres- 
sion that microhistory is not only alive and kicking, but is experiencing something 
of a revival. To name a few notable examples from the past five years, articles 
employing microhistory as a methodology have been published on the life of a 
workhouse pauper in nineteenth-century London, on lesbian persecution by the 
Gestapo, and on black women's social activism in St. Louis (Jones 2022; Marhoefer 
2016). Moreover, several prominent journals have given microhistory particular 
attention. In 2017, the Journal of Medieval and Early Modern Studies published a 
special issue on microhistory with reflections from Thomas Cohen and Ivan Szi- 
jártó.? The social history journal Past and Present recently published a special 
issue on the conjunctures between global history and microhistory.° The Ameri- 
can Historical Review published the reflections of several prominent historians 
on scale in history in 2016. The cumulative effect of these journals' special issues 
has been to keep microhistory visible in discussions of scale. 

Books are still being published in the genre by academic presses, including 
recent works on the microhistory of the picaresque novel with contributions by 
prominent microhistorians Giovanni Levi and Matti Peltonen (de Hann and 
Mierau 2014). Under the editorial helm of Siguróur Gylfi Magnüsson and István 
M. Szijártó, with an editorial board featuring Carlo Ginzburg, Simona Cerutti, Ed- 
ward Muir, Giovanni Levi, Jacques Revel and Matti Peltonen, among others, Rout- 
ledge began publishing a new generation of microhistories in 2018.’ Perhaps most 
tellingly, the Journal of Social Science History recently published Richard Huzzey's 
work, which uses prosopography and network analysis to examine British anti- 
slavery petitions (Huzzey 2019). Such developments might well reduce our con- 
cern that the changes wrought by digital tools and affordances are, on balance, a 
threat to microhistorical work. 


4 A non-scientific and cursory scan of the number of indexed works using the term microhistory 
in the title shows that there is indeed an increase since 2011. 

5 Special issue on Microhistory, Journal of Medieval and Early Modern Studies 47.1 (2017). 

6 Past & Present, Volume 242, Issue supplement 14. 

7 This series follows Sigurður Gylfi Magnüsson's and Istvan M. Szijártó's What is Microhistory? 
Theory and Practice (London & New York: Routledge, 2013). 
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Spatial humanities also offer some potential solutions to the potential unease 
associated with scaling up, bringing into relief understandings of geography, 
place and space as sometimes interconnected, sometimes separate realms of ex- 
perience. How we understand space in a kinetic sense shapes our “personal 
space,’ including our tolerance for touch by strangers and non-strangers, feelings 
in crowds and wayfaring. How we understand space in a conceptual sense leads 
to our sense of direction and perception of the landscape and our place in it. 
There are a multitude of varieties of space and place all shaped by historical, cul- 
tural, social and political contingencies. 

Moreover, the visual and spatial elements available through locative affor- 
dances can bring striking new dimensions to narration as conventionally under- 
stood. Menocchio’s sense of place was prominent and his cosmopolitanism clear, 
and it could certainly be illuminating to walk in the same streets as Menocchio 
guided by a tour app. One could get a feel, both spatially and phenomenologically, 
for the world that undoubtedly shaped his worldview by seeing the views he saw, 
visiting the sacred places he frequented, studying the church frescoes he gazed 
upon, exploring the building in which he did his business as mayor, and survey- 
ing the river on which his mill was built. Thus the advantage to the combination 
of spatial history with microhistory in particular-the ability to find the excep- 
tional normal with more richness and precision. 

Very few would argue that points on a map convey the same detailed infor- 
mation or narrative punch as the tale of Menocchio. Microhistory could teach dig- 
ital history a great deal about how to tell stories. As Lincoln Mullen points out: 


historians have long prized the art of storytelling and have often focused on the particulars 
of history, including individual lives, as a mode of communicating the complexities of the 
past. Yet digital history (at least computational history, rather than digital public history) 
has tended to pull historians toward the abstract and the generalizable at the expense of 
storytelling and historical particulars. (Mullen 2019: 383) 


Works in digital history often fall short of what we call the braided narrative or 
interweaving of method and story. 


2 A brief history of microhistory 


To understand further how microhistory could benefit spatial history and vice 
versa, we should first outline it with a bit more precision. Ivan Szijártó defines 
microhistory by three characteristics: 1) microscopic focus on a well-delineated 
object of study; 2) engagement with the “big” historical questions, partially in re- 
sponse to the annales school and the longue-durée; and 3) a stress on the agency 
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of the historical subject, particularly in the case of Italian microhistory (Szijärtö 
2022). Specifically, the historian is to look for Grendi’s definition of “exceptional 
normal”, or the normative subject. The scale of observation is reduced-the curious 
miller in the Friuli, the village exorcist, the returning soldier, the New England 
midwife and the scorned young woman. This allows for “a meticulous reconstruc- 
tion of events and relationships, and a juxtaposition of conflicting sources con- 
cerning the same event” (de Vries 2019: 23). 

Alongside Giovanni Levi’s Inheriting Power (1988), Carlo Ginzburg’s Cheese 
and the Worms (Il formaggio e il vermi) was at the forefront of microhistory and 
remains the most well known work in the field to an Anglophone audience. Pub- 
lished in 1976 in Italian, The Cheese and the Worms made the methodological 
claim that focused attention on one biography can tell us as much information 
about large topics such as the Counter-Reformation, the diffusion of print culture, 
the Venetian empire and the everyday lives of the non-elites as history at the 
more traditional scale. The establishment of the journal Quaderni Storici gave an- 
other forum to microhistorians and helped disseminate their work in Europe, the 
UK and the United States. As microhistory leapt across the Atlantic, Anglo- 
American histories adopted the methodology. Works like Laurel Thatcher Ul- 
rich’s, A Midwife's Tale, Robert Darnton's The Great Cat Massacre and Natalie 
Zemon-Davis's tour-de-force The Return of Martin Guerre became popular instan- 
ces of this adoption. Anglophone historians of Renaissance and Early modern 
Italy also began to write works of microhistory, including Gene Brucker's Gio- 
vanni and Lusanna, Thomas and Elizabeth Cohen's Words and Deeds in Renais- 
sance Rome and Judith Brown's Immodest Acts. Microhistory took its place in the 
methodological toolbox alongside social history and cultural history. And the in- 
creasing popularity of global history and transnational history did not signal the 
death of microhistory as some feared. In fact, the field was transformed and en- 
riched by new approaches and particularly by what has come to be known as 
global microhistory (Ghobrial 2019). 

Tonio Andrade and subsequent others argued for a world history less social 
science in approach and more attentive to narrative as the sheer scope and scale 
of analysis in global history tends to elide the stories and voices that most matter. 
Such elision on the part of global historians has been judged shortsighted: *we've 
tended to neglect the human dramas that make history come alive. I believe we 
should adopt microhistorical and biographical approaches to help populate our 
models and theories with real people" (Andrade 2010: 574) The microhistorian 
Francesco Trivellato has argued similarly and advocated for more productive 
conversations between microhistory and global history. 

A recent example, Daniel O'Quinn's 2018 monograph, Engaging the Ottoman Em- 
pire: Vexed Mediations, 1690—1815 for example, focuses on the linkages between Euro- 
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peans’ and Ottomans eighteenth-century itineraries by tracing a series of nodes and 
networks (O’Quinn 2018). In a series of connected microhistories, O’Quinn contem- 
plates geographical connections, maps and space or the lack of. Using microhistory 
and cultural analysis, O'Quinn explores “a series of intimate encounters, some of 
which have lasting geopolitical ramifications" (2018: 7), for what he calls constellatory 
analysis. His combination of quantitative, qualitative, microhistory, and global his- 
tory promotes new insights on the global early modern. 

Global history, however, is attentive to space as an important object of histor- 
ical analysis in the way that some other methodologies are not. As Lara Putnam 
points out, there are many geographic claims in history (Putnam 2006: 616). Many 
of these claims are local, regional, state-level and nation level but that simplifies 
the complexities of interactions with and within space. In their introduction to a 
special issue on the space and the transnational, Bernhard Struck, Kate Ferris 
and Jacques Revel point to this contradiction in perception of space and the ac- 
companying trap that can ensnare historians, noting that *historical and social 
processes cannot be apprehended and understood exclusively within customary, 
delineated spaces and containers, might they be states, nations, empires, or re- 
gions" (Struck et al. 2011: 573-574). Indeed, historians tend to under-theorize 
Space: 


As historians, we know we must draw artificial but useful boundaries in time in order to be 
able to make meaningful statements about historical developments. We call this periodiza- 
tion. We also need to do the same thing for space. That is, we need to think consciously, 
argue intelligibly, and reach (ever-provisional) collective conclusions about the spatial units 
that will allow us to talk about large-scale trends and patterns in a meaningful way. As far 
as I know, there is no consensus term for this process. (Putnam 2006: 620) 


Geographies lend themselves to history, easily, but space doesn't shape the discus- 
sion of history outside of the geographic. This is a problem when one considers 
that the nation-state is a construct that social actors don't necessarily live within. 
Actors are more likely to be shaped by streets, fields and landmarks than they are 
by the fact that they live within the Venetian Empire or Colonial India. When 
asked about their experiences, they give us different impressions of space than 
we impose upon them in our field definitions of history: Latin America, Eastern 
Europe, Indian Ocean and the Atlantic World. 

In general, microhistory, thanks to its discrete focus, is still attentive to actors 
in space, especially in a symbolic and cultural way. George R. Stewart wrote sev- 
eral influential books in the field specifically focused on place (Ginzburg et al. 
1993). His book, Pickett's Charge: A Microhistory of the Final Charge at Gettysburg, 
July 3, 1863 analyzed a decisive battle in the American Civil War that lasted less 
than half an hour (Stewart 1987). As Ginzburg notes in a discussion of this work, 
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“the outcome of the battle of Gettysburg is played out in a matter of seconds be- 
tween a clump of trees and a stone wall”(Ginzburg et al. 1993: 12). Microhistory 
also had its origins in local history. In Inheriting Power and subsequent reflec- 
tions, Giovanni Levi addressed this focus: 


Microhistory is more concerned with the symbolic significance of space as a cultural datum: 
it concentrates therefore on a precise point, but one which may be invested with different 
characteristics by the different players involved, and be defined not only by its geographical 
situation but by the significance attached to a place and a given situation that may be con- 
tained or determined by a broad range of connections and thus linked to other, widely dis- 
tributed spaces. It seems to me to be less characterized by spatial dimensions than by the 
network of meanings and interrelations set up by the particular phenomenon being studied, 
and reads places as ever-evolving cultural and social constructions. (Levi 2019: 40) 


There are many commonalities between Levi’s above dictum and the approach of 
spatial history. 

Influenced by Lefebrve, Foucault and Baudrillard*s theorizations of space, the 
spatial turn was a push past an exclusively geographic understanding of space. 


3 The spatial turn 


Despite its conflation, however, spatial history is not the same as GIS. While it is 
hard to pinpoint the precise moment(s) of the (or a) spatial turn in history, not 
least because space and history cannot be disconnected from one another, it is a 
little easier to identify the point at which mapping and geospatial approaches, 
particularly those using Geographic Information Systems (GIS), became more 
prevalent. Anne Kelley Knowles’ monograph, Placing History: How Maps, Spatial 
Data, and GIS are Changing Historical Scholarship was one of the first influential 
discussions of this approach, followed by the work of Ian Gregory, Tim Cole, and 
many more scholars (Knowles and Hillier 2008; Knowles et al. 2014; Gregory and 
Ell 2007). Research groups such as The Spatial History Project at Stanford were 
established, courses and institutes were offered, tutorials were written and proj- 
ects proliferated. 

The reasons for the increasing popularity are clear, since the goals and benefits 
of historical GIS are manifold. Not only is it a means of visually representing spatial 
information, but also the quantitative and qualitative techniques that underpin GIS 
allow for multi-layered investigation. As Tim Cole has pointed out, this combination 
of methodologies is one of the innovations of doing this sort of work: 


It is the integration and almost seamless passage between one set of tools and another, from 
one method and technique to another, that makes GIS a powerful tool within a broader 
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range of digital humanities approaches that draw on the processing power of computer 
technologies to read the archive differently. (Cole and Giordano 2021: 274) 


GIS can put historical maps, hundreds of analyzable data points, geographic refer- 
ents and powerful visualizations at our fingertips. It can make it easier to ex- 
change data, share research and do large-scale comparisons. Historical GIS can 
readily allow for the creation of public-facing, interactive exhibits, projects and 
experiences for non-historians. It may seem that historical GIS is simply a return 
to geographic-based history. After all, part of the methodology involves drawing 
geographic boundaries: creating maps, delineating them with lines and incorpo- 
rating topography. Many projects are built upon streets, fields and land-surveys. 

Yet within the past decade, spatial humanities scholars have begun to repur- 
pose GIS for investigations of space and place. Tim Cole and Alberto Giordano 
used GIS to look at what they called the “geography of oppression” in Budapest 
and Italy during the Holocaust (Giordano and Cole 2018). In particular, Cole and 
Giordano argue for a definition of space consisting of three dimensions: location, 
locale and sense of place. In their view, a GIS of place will bridge the gap, both 
epistemological and ontological, “between the humanities and GIScience” (Gior- 
dano and Cole 2020: 842). 

Several recent historical GIS projects are starting to realize similar goals. Nota- 
ble projects include Mapping Decline: St. Louis and the American City, which allows 
a user to examine changing racial demographics and their impact on an American 
city from a variety of types of data collected during the last century.? The Atlantic 
Networks Project helps visualize the North Atlantic slave trade, including patterns 
of death and the seasonality of the passage.? The Atlas of Early Printing allows re- 
searchers to map trade routes, universities, fairs and borders on the networks of 
early printed books. The Digitally Encoded Florentine Census (DECIMA), which 
will be discussed more presently, allow users to explore and map several early 
modern Florentine censuses and explore various aspects of the urban environment 
including where people with certain occupations lived, the composition of women's 
convents and the diffusion of prostitution across the city." 

Historical GIS can, of course, favor the sort of work that microhistorians are 
critical of, especially quantitative approaches. Indeed, microhistory evolved as a 


8 “Mapping Decline: St Louis and the American City", accessed June 28, 2023, http://mappingde 
cline.lib.uiowa.edu/. 

9 Andrew Sluyter. “The Atlantic Networks Project", accessed June 28, 2023, https://sites.google. 
com/site/atlanticnetworksproject. 

10 University of Iowa. “The Atlas of Early Printing", accessed June 28, 2023, http://atlas.lib.uiowa.edu/. 
11 Decima, accessed June 28, 2023, https://decima-map.net/. 
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partial response to the turn towards social scientific methods in history, particu- 
larly the use of large datasets and statistical methods. This turn towards social 
science was itself enabled by the development of machine-readable datasets and 
hastened by personal computers and programs like SPSS that made storing and 
crunching data possible at an unprecedented scale. Historic demographic and 
population data could be more freely analyzed, plague mortality rates could be 
traced, economic crises outlined and birth rates scrutinized. David Herlihy and 
Christiane Klapsich-Zuber's computerization and analysis of the 1427 Florentine 
Catasto was an example of a history of this type (1985). First published in French 
in 1978 and in an English translation in 1985, Tuscans and Their Families breaks 
down a historic Florentine census, analyzing household demographics and wealth 
including data on professions, marriage rates, birth rates and the gender of heads 
of households. 

Reviews of Herlihy and Klapsich-Zuber’s quantitative work exemplify the cri- 
tiques of such approaches. Many of the reviews pointed out that numbers could be 
fuzzy and sloppy and thus be hard to generalize from. Like many works of social 
scientific history, Tuscans and their Families made big claims that did not hold up 
to scrutiny under a magnifying glass. And from the beginning, a persistent criticism 
of quantitative approaches was that, in turning important histories into numbers 
and percentages, they could make for dull reading. Whether or not works like Tus- 
cans and their Families are slow going for individual readers, they certainly elicited 
other kinds of backlash, particularly for claiming to get at history from below. De- 
mographic numbers on the marriage age of peasants were indeed novel in that 
they focused on peasants as an object of analysis. However, from such data, one 
could not actually get a sense of the peasant, the range of his or her marriage 
choices, the influence of family preference and the real terms of the dowry or 
bridal gift, whether it consisted of a silver spoon or a promise of cash in the future 
brought about by the sale of crops. While social scientific history illuminated some 
things previously difficult to see, particularly without the aid of computer analysis, 
it still, intentionally or otherwise, made structure and thereby the elites a persistent 
focus in historical analysis. While the Florentine Catasto provided certain insights 
concerning, for instance, the number of female-headed households, it did not aid 
an understanding of the everyday life of women. Instead, the Catasto is in many 
ways a document that tells us more about the desires of the elite to understand 
how they could enhance the practice of tax farming. Enter microhistory. 

In some ways, microhistory vs. digital history is a false methodological dichot- 
omy. Very few historians are exclusively microhistorians any more than digital his- 
torians are exclusively digital. Most of us borrow methods and materials as sources 
and subject dictates. In some ways, however, both are the outliers methodologically. 
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4 Playing with scale 


While microhistorians rarely become digital historians, many digital historians 
borrow from microhistory. One obvious explanation for this is that microhistory 
is amore established approach than digital history as we have now come to think 
of it. Another explanation is that digital history as a methodology is only now 
being taught more widely in graduate programs. Yet another reason is that the 
ego-documents that form the basis for many microhistories lend themselves to 
certain types of digital humanities analysis. One example of microhistory’s influ- 
ence on digital history is Cameron Blevins’s topic-modeling of Martha Ballard’s 
Diary-a source that was to become the basis for one of the first Anglophone mi- 
crohistories-Laurel Thatcher Ulrich’s A Midwife’s Tale: The Life of Martha Ballard 
(1990). Martha’s diary contains over 10,000 entries which by any standard is diffi- 
cult and time-consuming to analyze.” Using a package for natural language proc- 
essing (NLP), the algorithm generated a list of thirty topics and thematic trends in 
the diary, confirming some of Ulrich's hypothesis concerning an increase in the 
usage of certain words over time. Blevins noted that in some cases, the NLP analy- 
sis was more useful than traditional hermeneutics. Blevins's work and interpreta- 
tion of the data enriches and augments Ulrich's reading. 

Approaches that rely on larger quantitative datasets might seem antithetical 
to the endeavor of microhistory. Much of the work of GIS, for example, is quanti- 
tative, and often it includes hundreds if not thousands of data points. Indeed, 
much of the work is quantitative enough that many practitioners of historical GIS 
have colloquially described it as 70 percent data collection and preparation. His- 
torical GIS would seem to be the least compatible with microhistory for quantita- 
tive and technical reasons. Yet GIS and spatial history have a great deal to 
contribute to microhistory. In Geographies of the Holocaust, Tim Cole, Alberto 
Giordano, and Anne Kelly Knowles point out that investigating events like the Ho- 
locaust must be conducted using a variety of methods, quantitative and qualita- 
tive alike: 


Investigating the where of these Holocaust events necessarily means working at a variety of 
scales, for they took place from the macro scale of the European continent; through the na- 
tional, regional, and local scales of individual countries, areas, and cities: and down to the 
micro scale of the individual body. (Knowles et al. 2014: 12) 


Cole, Giordano, and Knowles thus advocate for this mixed-methods approach that 
includes GIS, visual analysis and qualitative methods. They argue further that 


12 Cameron Blevins, *Topic Modeling Martha Ballard's Diary," April 1 2010, accessed June 28, 
2023, https://www.cameronblevins.org/posts/topic-modeling-martha-ballards-diary/. 
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“spatial analysis and geo-visualization can complement and help specify the hu- 
manistic understandings of space and place by exploring and quantifying rela- 
tionships among things and people to discover and visualize spatial patterns of 
activity” (Knowles et al. 2014: 15). By using a braided-narrative approach, they 
zoom in and out on the various geographies, spaces and places that were im- 
pacted by, and had impact on the Holocaust. 

Other works are illustrative of the productive insights of this approach that 
combines GIS and microhistory. In the most recent book in Routledge’s Microhis- 
tory series, Neighbors of Passage: A Microhistory of Migrants in a Paris Tenement, 
1882-1932, Fabrice Langronet, argues that digital methods and tools can in fact 
make microhistory easier: “microhistory from scratch, so to speak, is now within 
the realm of possibility. New digital tools and databases make it easier to track 
specific individuals in the sources” (2022: 10). To produce a microhistory of a tene- 
ment, Langronet collected data about the tenement’s occupants in birth and 
death registers and explored municipal archives, naturalization files, police regis- 
ters and other quantitative sources for those who populated this community. Fol- 
lowing the clues and seeking contextualization with mapping, he produces a rich 
history of the geographies as well as spaces and places of migration. 

Newer studies like these align with Istvan Szijártó's idea of microhistory as a 
junior partner, or microhistory *built on a partnership rather than a rather than 
a rivalry of the two macro and micro approaches", or a microhistory that would 
inform and benefit from macrohistory, quantitative history or digital history (Szi- 
jártó 2022: 211). The combination of macro and micro approaches has been la- 
beled as Microhistory 2.0 or even as a third wave of microhistory, though what 
such terms mean is open to debate (Renders and Veltman 2021). Part of the resur- 
gence in microhistory, these new forms of microhistory retain the earliest goals 
of the methodology while also incorporating insights from newer fields like global 
history. Microhistory 2.0 aims to be particularly attentive to non-western perspec- 
tives and the colonization of archives. 

Other advocates of Microhistory 2.0 have argued for an incorporation of digi- 
tal humanities. Fabrizio Nevola has talked about a Microhistory 2.0 that produc- 
tively incorporates digital humanities methods, tools and techniques to create 
interactive narratives (Nevola 2016). He and his colleagues developed the Hidden 
Cities phone app, which allows users to interactively experience Renaissance 
Florence. Users are led on a guided tour narrated by a series of archetypes, in- 
cluding Cosimo de'Medici, a female silk weaver, a widow and a policeman, actual- 
izing what Nevola has labeled a new form of microhistory. Digital history projects 
can also lead to a Microhistory 2.0. The growth of larger scale digital humanities 
projects, datasets and new tools has transformed the field of Italian history in 
particular. Projects like, DECIMA, a historical GIS project which maps early mod- 
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ern census data from Renaissance Florence on the sixteenth-century Buonsignori 
map, have reshaped the field and allowed researchers to reconsider scale in un- 
precedented ways. With DECIMA’s web-GIS interface, for example, one can Zoom 
in and out and play with scale in ways that elicit narratives that would be impos- 
sible to make visible through conventional methods. In a companion edited vol- 
ume, for example, several scholars reflect on how they have used DECIMA to 
zoom in on histories of nuns and prostitutes to reconstruct otherwise inaccessible 
stories. Working on nuns, Julia Rombough and Sharon Strocchia examine place- 
ment patterns of female religiousness in relation to their social networks and 
world in which they lived. As they point out, “the convent has rarely been utilized 
as the main lens through which to view how Italians constituted social networks, 
distributed religious patronage, and strengthened ties to other people and places 
within the early modern city” (Rombough and Strocchia 2018: 89). Using the 
1548-1552 Florentine census of nuns and the 1561-62 population census, they 
mapped the women of religious houses and their proximity to their familial pal- 
azzos and neighborhoods. Their research showed that placement of women in 
geographically distant convents, instead of religious houses closer to their family 
complexes, was part of a wider pattern to diversify economic and social ties out- 
side neighborhoods many families had been living in for centuries. The mapping 
also highlights how female networks, in which convents were key nodes, played 
crucial roles in family strategies and fortunes. It also sets into relief how powerful 
these institutions were in the urban landscape: 


Because these socio-spatial networks permitted both people and information to flow into and 
out of the convent with both rapidity and regularity, cloistered religious women could stay 
abreast of neighborhood news, keep tabs on property values, grasp dynamics of heated dis- 
putes and participate in broader forms of civic discourse. (Rombough and Strocchia 2018: 97) 


Rombough and Strocchia's findings challenge the claims of previous scholarship 
that Florentine women were confined to and operated exclusively within the pri- 
vate space of the home and convent almost wholly unconnected to the wider world 
apart from their husband's family (in the case of married women) or their fellow 
nuns (in the case of religious women). Thus, this sort of analysis opens up under- 
standing for varieties of spaces: social, cultural, mental, relational and so forth. 
Such projects show that the macroscale of digital humanities projects with 
tens of thousands of points of data and the microscale of a thick inquisitorial de- 
position are not incompatible. For historians of early modern Italy in particular, 
GIS and spatial history have increasingly become a path to doing both qualitative 
and quantitative investigations that complement one another. GIS and spatial his- 
tory are not just looking at thousands of Menocchios; geospatial approaches can 
allow historians to zoom in and out to explore both the threshold of a noble pa- 
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lazzo, the working class tavern and the neighborhoods prostitutes lived in as well 
as bandits on the borders between Bologna and Modena and the spaces where 
guns were most likely to be used. Carlo Ginzburg’s Menocchio is easier to under- 
stand if you have a basic concept of where Domenico Scandella lived. In a Special 
issue of the American Historical Review, David Aslanian argued that 


there seems to be an inverse relationship between scale and human agency; in other words, 
the greater the scale of analysis (temporally or spatially), the less room is left for accounts 
of human agency. (Aslanian et al. 2013: 1444) 


A combination of digital spatial history and microhistory, may allow us to uphold 
microhistories’ political commitment to understanding agency. Since the unfold- 
ing of microhistory, scholars have been criticized for disentangling the original 
political focus of Italian microhistory in favor of narrative and novelty (Guldi and 
Armitage 2014). While many of these criticisms are unfair, there is no question 
that microhistory is sometimes branded as more palatable history that is suitable 
for public consumption, irrespective of political commitments to the narrative 
project and a desire to tell the stories that have been pushed aside in favor of 
narratives of structure. Digital spatial history and GIS could help return microhis- 
tory to its conceptual and ideological roots. 

In her article, “Mapping Working-Class Activism in Reconstruction St. Louis”, 
Elizabeth Belanger advocates for this type of GIS work: 


Blending the tools of micro-history with historical Geographical Information Systems (GIS) 
permits us to chart the social networks and everyday journeys of black working-class 
women activists and the middle-class men with whom they came into contact. Social and 
spatial ties shaped the activism of St. Louis’ working-class women; mapping these ties re- 
veals the links between everyday acts of resistance and organized efforts of African Ameri- 
cans to carve a space for themselves in the restructuring city and make visible a collective 
activism that crossed class and racial boundaries. (Belanger 2020: 354) 


Bellanger’s work highlights the extent to which black activists mobilized space to 
protest segregation and the political spheres of influence in which they operated. 
By geolocating black churches and activist homes in post-reconstruction St. Louis, 
she unfurls the stories of black, working class women. GIS and mixed methods 
can provide agency for communities that have literally been erased from the 
map. In the Rosewood Massacre, Edward Gonzalez-Tennant uses geospatial analy- 
sis and anthropology to recover the stories of lost communities like Rosewood, 
Florida, a town that was literally burned to the ground in 1923 (Gonzäles-Tennant 
2018). As Simone Lässig has noted of these approaches that combine the two: 
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digital history makes it possible to productively intertwine macro- and microhistory, also free- 
ing historical figures from an anonymity that the logic of archival collections had previously 
relegated them to [. . .] digitization offers paths to new sorts of history from below - an ap- 
proach to less prominent people historians have pursued since the 1970s. (Lássig 2021: 18) 


Digital spatial history is particularly suited to this endeavor. Digital spatial history 
can help us follow clues, uncover structures that were previously not apparent 
and focus on and better understand the agency of historical actors. 

The methodologies of microhistory may now be more important than ever 
with the scaling up of history: 


microhistory's central argument-that a variation of scales of analysis breeds radically new 
interpretations of commonly accepted grand narratives-has acquired new urgency as glob- 
alization and its discontents demand that historians produce new grand narratives about 
the ways in which interconnections and hierarchies have developed on a planetary Scale. 
(Trivellato 2015: 122) 


Similarly, this scaling up of history can be put to microhistorians' use. Methods of 
digital history could help microhistorians find the exceptional normal in a cache 
of documents, follow clues and illuminate the mentalities, particularly the spatial 
ones, of their subjects. The methods of digital history could help microhistorians 
track down clues with more ease, particularly if those clues are in far-flung ar- 
chives or could be found in collections that haven't been touched. Apart from ar- 
chival clues, digital spatial history can help establish a foundational sense of 
place in a way that further illuminates the subjects of microhistory. In addition to 
sharing similar aims, microhistory and digital spatial history have much to offer 
one another. 
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Mariaelena DiBenigno and Khanh Vo 
@ Scaling Digital History and Documenting 
the Self-Emancipated 


Abstract: This chapter explores scale in public and digital history projects by ex- 
amining runaway advertisements from nineteenth-century Virginia newspapers 
placed by members of the Founding Generation. As digital history allows for new 
scalability, the chapter documents existing databases to frame microhistories of 
enslaved individuals within the larger historiography of slavery, agency and re- 
sistance. The microhistories captured within runaway advertisements are often 
fragmented and unfound, creating evidentiary gaps within the scholarship of en- 
slaved narratives. Our digital project's intervention proposes reading each run- 
away advertisement as micro-maps of the enslaved individual's life and historical 
environment. This interdisciplinary approach uses the digital humanities method 
of thick mapping to juxtapose temporal, spatial, geographic, environmental and 
historical data into a layered reconstruction of networks, community and kinship 
within enslaved communities from the early United States. 


Keywords: digital humanities, digital history, public history, slavery, community 
networks 


In the United States, debates about the history of slavery involve those who center it 
in national origin narratives against those who deny its critical role in our current 
society. A digital history of slavery adds further tension: how can the history of 
bondage be both a national and personal narrative, and how might technology help 
us examine mythologized historical periods using documents produced in those 
eras? Runaway slave advertisements, in particular, help to complicate flattened rep- 
resentations of the American Revolution and its elevation of Enlightenment ideals.' 
That the Founding Generation surveilled and pursued self-emancipated individuals 
generates powerful discussions about life, liberty and the pursuit of happiness.” 


1 Morgan (1975) and Taylor (2021) write of the paradox of the American Revolution, that the rise 
of liberty and equality in America accompanied the rise of slavery. Within this paradox is the 
republican ideal of freedom, formed from the distinct English conception of freedom born out of 
Enlightenment thinking that permeated England's national identity and imperialization efforts. 

2 The 1688 Germantown Quaker Petition is the earliest known document that called for outright 
abolition of slavery in the American colonies on grounds of universal human rights, drafted by 
Francis Daniel Pastorius for the Germantown Meeting of the Religious Society of Friends in Penn- 
sylvania. In 1790, Benjamin Franklin and the Abolition Society put forth a similar petition to the 


8 Open Access. © 2024 the author(s), published by De Gruyter. LGS This work is licensed under the 
Creative Commons Attribution-NonCommercial 4.0 International License. 
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There are also primary source examples of these advertisements because of their 
attachment to historically privileged persons. Does this wealth of records skew our 
understanding of enslavement? Is this just another example of how the Founding 
Generation dominates knowledge production about the early United States? 

Here, it is important to consider the upcoming 250th anniversary of the United 
States. Public historians and museum professionals are working to “reframe history" 
in the congratulatory aftermath of the 1976 bicentennial in the United States.” Plan- 
ning organizations remind us to *make sure that Americans of all ages and back- 
grounds and in all places that see themselves in history". Scale is useful as we work 
to focus on the underrepresented and zoom away from traditional top-down ver- 
sions of history. Mapping the advertisements of enslaved individuals connected to 
the Founding Fathers raises questions about whose history gets told and exemplifies 
the complex, snarled system of slavery. However, questions still remain: how far- 
reaching is the 250th? What are its borders and boundaries? How far does it extend 
geographically, temporally, topically? Who deserves inclusion? 

Digital history might answer some of these questions. From online classrooms 
to virtual museum tours, digital engagement with American history requires a rein- 
terpretation and rescaling of datasets and informational repositories. In our reflec- 
tion on the different dimensions of scaling history — of past and present, micro and 
macro, individual and state — we propose to create a digital history project that 
maps runaway slave advertisements from the eighteenth and nineteenth century 
United States placed by members of the Founding Generation. By examining a sin- 
gular historical source from a particular community, we narrow the scope of our 
discussion to storytelling, examining perspective shifts from broad, general narra- 
tives to marginalized communities and individual experience. The shift from 
macro to micro further illuminates issues of how sources and data, often missing, 
are manipulated in historical narrative reconstruction. From there we can engage 
with visions of what scholars can do when history is made digital. 


U.S. Congress to end the slave trade and provide means for the abolition of slavery. The petition 
was rejected on grounds that the U.S. Constitution limited Congress's power to end the trade and 
emancipation of enslaved people. By the issue of the Missouri statehood in 1820, in a letter to 
John Holmes, Thomas Jefferson expressed the struggle between upholding national ideals and 
slavery's expansion in the infamous line, *we have the wolf by the ear, and we can neither hold 
him, nor safely let him go". Thomas Jefferson, *Letter from Thomas Jefferson to John Holmes 
(April 22, 1820),” Thomas Jefferson Papers, Library of Congress, accessed June 28, 2023, http://hdl. 
loc.gov/loc.mss/mtj.mtjbib023795. 

3 See the Association of American State and Local History initiative, *Reframing History", 
accessed June 28, 2023, https://aaslh.org/reframing-history/. 

4 See the AASLH’s 250th field guide, accessed June 28, 2023, http://download.aaslh.org/Making+His 
tory+at+250+Field+Guide.pdf. 
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In the following example, the story of George and Phebe shows us how Amer- 
ican independence was not guaranteed post-1776; that place mattered for commu- 
nities seeking refuge and freedom; that belonging and citizenship were long- 
denied to specific groups; that the Revolution created a nation that viewed Black 
persons as commodity, as labor, as chattel; and finally, that we should ask new 
questions of old documents to obtain more truthful interpretations. This is not 
new history; it has been here all along, obfuscated by white supremacy, requiring 
appropriate technologies to reveal it. 

We strive to map these experiences, but we do not want to speak for the expe- 
rience or impart our own thinking onto these historically marginalized persons. As 
witnessed by recent digital projects, this type of interpretation can be done poorly 
and cause further harm? In our ongoing work, we strive to let the available pri- 
mary sources speak for themselves. Instead of offering a revision of these microhis- 
tories, we rescaled to locate enslaved agency — not enslaver apprehension. 


1 George and Phebe, July 1826 


Ten Dollars Reward 

Ranaway [sic] from the farm of James Monroe, Esq. in Albermarle county, on Monday 
nights last, a negro man named George and his wife Phebe. 

GEORGE is about 30 years of age, strai[-]made, six feet high, tolerably dark complexion, 
had on domestic cotton clothes; but he will no doubt change them. 

PHEBE is about 28 years of age, common size, dark complexion, and when she went 
away was clad in domestic clothes. The above negroes are supposed to be making for the 
county of Loudon, or probably have obtained free papers, and are endeavoring to get to a 
free state. If taken in this county, I will give a reward of Ten Dollars : or Fifteen Dollars if 
taken out of the county and secured in any jail so that I can get them again. 

WM MOON, 
July 8, 1826 For COL. JAMES MONROE 


In the summer of 1826, two people chose to self-emancipate. Their names were 
George and Phebe, and they were husband and wife. They were enslaved by James 


5 In July 2022, digital humanities scholar Jessica Marie Johnson called out the “Runaway slave proj- 
ect” (http://runawayproject.info/): *Enough with these projects and websites about slavery that 
have no faces, no authors, no attribution, and no accountability. No descended engagement. No 
Black staff. No Black leads. No Black praxis. Just DH and vibes". We acknowledge and respect John- 
son's anger. We aim to create a DH project that is transparent, transformative and non-extractive 
of available and rapidly digitized sources. See Johnson's July 21, 2022 Twitter feed, accessed June 28, 
2023, https://twitter.com/jmjafrx/status/1550072236928909317. 
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Monroe, fifth president of the United States and Revolutionary War veteran, at his 
Albemarle County, Virginia, plantation then known as Highland. As of this writing, 
this runaway slave advertisement is the only known documentation of George and 
Phebe's existence. However, there is much to learn from the 135 word text. 

Runaway adverts provide physical and relational details about enslaved per- 
sons. *Col. James Monroe" placed a ten-dollar reward for the return of Phebe and 
George. Their provided ages were approximate: Phebe *about 28 years of age" 
and George *about 30". Some of the physical description is vague and nonspecific: 
Phebe is *common size" while George is "straight made". However, other details 
are more particular. Complexion, height, and clothing are noted for both individ- 
uals, as well their possible final destination, *the county of Loudon" where Mon- 
roe owned another property. The ad assumed Phebe and George went there 
“endeavoring to a free state". Interestingly, the ad also suggests the two may have 
*obtained free papers" to ensure easier travel. This specific line begs two ques- 
tions: who could have supplied them with such documents, and what networks of 
freedom did they operate within? 

In the 1820s, freedom seekers like Phebe and George were most likely heading 
north. However, it is noteworthy that they ran away together. It was easier to 
evade capture alone. The fact that they chose to self-emancipate as a couple pro- 
vides insight into their relationship and desire to remain together. Historian An- 
thony Kaye's work defines enslaved neighborhoods by plantation lines and kinship 
ties that crisscrossed physical and psychic boundaries (Kaye 2007: 4-5). Runaways 
were not often accepted into new neighborhoods; they often returned to their old 
neighborhoods where help was easier to obtain (Kaye 2007: 129, 133). Was this also 
the case for George and Phebe, as they absconded to a county where their enslaver 
also owned a plantation? Kaye's academic work helps us consider George and 
Phebe within a larger community: 


Despite planters' attempts to control mobility [. . .] slaves forged enduring bonds to adjoin- 
ing plantations [. . .] extended networks of kinfolk, friends, collaborators, and Christians; 
gave permanence to their neighborhoods by creating and re-creating the bonds that held 
them together, even as slaveholders constantly sold people in and out of the place. By press- 
ing social ties across plantation lines, in short, slaves attenuated the power relations of slav- 
ery and cleared some ground for themselves to stand on. (Kaye 2007: 6) 


George and Phebe's advertisement speaks to agency, resistance, community and 
scale. We learn about a married couple's commitment to a life free from enslave- 


6 "Ten dollars reward," The Central Gazette (Charlottesville, Virginia) July 15, 1826, Special Collec- 
tions, University of Virginia, accessed June 28, 2023, https://encyclopediavirginia.org/george-and- 
phebe/. 
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ment, family separation and forced labor. We see an explicit decision to take a 
specific journey. We see geography relative to where George and Phebe came 
from and where they may have gone. We can also imagine the weight of their 
decision to self-emancipate. The choice came with bodily and psychological risks 
that may have weighed heavily on both individuals. 

When studying the United States, the scale of chattel, race-based slavery is mas- 
sive. Slavery infiltrated every aspect of public and private human experience, and 
we see its afterlife in contemporary racism deeply embedded in medicine, politics, 
the economy and criminal justice system, to name but a few (Hartman 2007: 6). For 
George and Phebe, the scale of slavery begins as movement between locations, such 
as fleeing Albemarle County for Loudoun County. It also connotes the immense and 
claustrophobic scale of racialized power dynamics in nineteenth century Virginia. 
Finally, and most expansively, the scale of slavery acts as a theoretical concept to 
explore how historical knowledge is produced by zooming in on one specific hege- 
monic narrative. In what Michel-Rolph Trouillot terms “a particular bundle of silen- 
ces” (Trouillot 1995: 27), George and Phebe’s individual journey to self-emancipation 
are rendered as archival silence, obfuscated by collective narratives of surveillance 
and recapturing of fugitives from slavery. By recognizing what scale means discur- 
sively, we “encourage fresh thought” about these runaway slave advertisements 
and shift focus from enslaver oppression to enslaved agency (Ball 2010: 11). 


A note on terminology 


Before we continue the discussion of rescaling digital history, it is necessary to 
briefly address the terminology and languages, particularly of the history of 
American slavery, that we will use throughout this chapter. In 2018, P. Gabrielle 
Foreman and other senior slavery scholars created a community-sourced docu- 
ment to guide discussions of slavery in its historical and global context. The docu- 
ment is a call to action for scholars to acknowledge the evolving ways that 
slavery and those kept in bondage and oppression are analyzed. Considering lan- 
guage to adopt and avoid confronts the brutality of the system as well as reconsti- 
tute agency to those who were enslaved.’ 

The frequent terms used to describe enslaved people who endeavored to be 
free are “fugitive”, *runaway", and “self-emancipated”. The dataset we are using 


7 P. Gabrielle Foreman, et al., “Writing about slavery/teaching about slavery: this might help," 
community-sourced document, accessed May 14, 2022, https://docs.google.com/document/d/ 
1A4TEdDgYsIX-hIKezLodMIM71My3KTNOZxRvOIQTOQs/mobilebasic. 
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for our digital history project have been historically called “fugitive slave adver- 
tisements” and “runaway slave advertisements”. The Fugitive Slave Act, first codi- 
fied in the United States Constitution and then refined in the 1850 legislation, 
cemented this language in the historiography of slavery. The Act demanded the 
return of enslaved individuals, even from free states, legally required enslaved 
people to stay enslaved. It allowed federal commissioners to determine the fate of 
alleged fugitives without benefit of a jury trial or even testimony by the accused 
individual. The historical term of “fugitive slave ads”, denotes a racial divide be- 
tween the unquestioned privileges for white Americans and unchallenged illegal- 
ity of Black Americans who lived and labored in the United States to be free. On 
freedom as a condition by which a subject may have a body - as opposed to a 
captive who is flesh — “self-ownership or proof of being able to own (and there- 
fore use) property emerged as part of an uniquely U.S. racial history of citizenship 
and liberal subjectivity” (Atanasoski and Vora 2019: 194). By criminalizing the act 
of seeking freedom for a selected group, usage of terms like “fugitive”, even if it is 
historically accurate, sets the enslaved body as the site of unfreedom that de- 
prived women and men who were enslaved of the right to consent. Enslaved peo- 
ple's self-emancipation constituted not just a moment of refusal, but also an 
affirmation of agency, of claiming ownership over themselves and their own in- 
terpretation of slave codes and manumission laws. “Self-emancipation”, then, re- 
flects the decision and act of the enslaved individual to pursue freedom outside 
of the realm of legal means of emancipation. 

We have chosen to use the term “runaway slave advertisements” to reconcile 
with issues of scaling historical context between narratives of oppression and 
narratives of agency. “Runaway” carries notes of illegality for individuals who 
sought self-emancipation. However, unlike “fugitive”, runaway advertisements 
have been historically used to address all forms of escaped labor. The earliest of 
such advertisements in Virginia Gazette, published from 1736 to 1780, were for 
missing indentured servants, often Irish or Scottish. A 1713 notice in the Boston 
News-Letter, widely acknowledged as the first continuously published newspaper 
in the American colonies, listed a runaway manservant alongside advertisements 
for an enslaved Black woman and enslaved American Indigenous boy.° The termi- 
nology of *runway" encompasses a wider breadth of historical subjects and al- 
lows us to read resistance in and through the advertisements when we address 
the authorship of these historical documents. 


8 "Advertisements from the Boston news-letter, 1713,” ANCHOR: A North Carolina History Online 
Resource, accessed March 23, 2022, https://www.ncpedia.org/media/advertisements-boston-news. 
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2 Runaway slave ads: A brief historiography 


Humanities scholars have long examined runaway slave advertisements, but we 
identify a scholarly shift in understanding enslavement through them. Newspapers 
are a technology long studied for their contemporary perspectives on historical 
events. Runaway slave advertisements included ads placed by enslavers, overseers, 
estate administrators and jailers who held enslaved individuals in their custody 
under the construct of people as property for which we acknowledge within the 
historical context, but do not align with. As runaway advertisements became a sta- 
ple in eighteenth and nineteenth century American newspapers, the content under- 
went a standardization. Since the goal of these ads was to facilitate the recapture of 
the escaped enslaved — and therefore, labor — standardization meant faster print- 
ing and transmission by word of mouth as the contents could be easier read in pub- 
lic gathering places like local shops and taverns. We have long known about 
runaway slave ads, but when researchers ask different questions about them, we 
can identify different answers. 

In 1974, Lathan Windley conducted research based around the runaway adver- 
tisements of Virginia and South Carolina, later published in 1995 in A Profile of Run- 
away Slaves in Virginia and South Carolina from 1730-1787. For many historians, 
Lathan Windley's work was groundbreaking. His search through the multiple edi- 
tions of the Virginia Gazette and related newspapers of similar name revealed 1,276 
runaway advertisements throughout the publication's history. Windley focused on 
*personal variables" and the escape specifics, transforming these advertisements 
into datasets for scholars to use in new ways. It is a spectacular collection devoid of 
interpretation, though it elevates enslaved persons as the primary focus of study. 

Windley's analog database for runaway advertisements seemingly set a stan- 
dard in how future digital databases have been cataloged and structured these 
advertisements. The two most prominent are *Northern Carolina runaway slave 
notices, 1750—1865" from University of North Carolina Greensboro and *Freedom 
on the move" from Cornell University. Both digital databases expand Windley's 
initial work to encompass great geographic dimensions and scope, recognizing 
the potentiality of runaway advertisements as historical sources. 

*Northern Carolina runaway slave notices"? was one of the first of its kind to 
provide a comprehensive digital archive of runaway slave advertisement. Al- 
though regionally focused on predominantly advertisements in and of North Car- 
olina and drawn from published works of Freddie L. Parker's Stealing a Little 


9 “North Carolina runaway slave notices," University of North Carolina Greensboro, 2011, accessed 
March 15, 2022, https://dlas.uncg.edu/notices/. 
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Freedom: Advertisements for Slave Runaways in North Carolina, 1791-1840 and 
Lathan Windley’s Runaway Slave Advertisements, it sets a standard for this type 
of digitization project that makes its materials accessible to all users through its 
easy-to-use interface. Before, these ads were available in bound volumes and mi- 
crofilms, not in an online open access format. As the project was done in collabo- 
ration with the University’s library, it is an example of how histories are made 
public using library digitization, cataloging and transcription. It is important to 
remember that libraries and archives are also scalable sites of public history. Its 
focus on the usability and sustainability makes the project open for all types of 
scholarly interventions and educational engagement because the database itself 
does not attempt to interpret or create its own narrative and intellectual work. 

Similarly, “Freedom on the move"? created in 2017, is the largest digital collec- 
tion runaway slave advertisements published in North America. The database is sus- 
tained through its extensive funding, expert team assembled and crowdsourcing its 
dominant force of labor. The scope of the project extends the discussion of enslave- 
ment as not only a Southern phenomenon. The system of surveillance, labor man- 
agement and capture of enslaved people were not contained to the Deep South, but 
embedded into American capitalism (Johnson 2013; Baptist 2014; Beckert 2014). It il- 
lustrates a broader picture of the process of self-emancipation by highlighting the 
mobility and networks that enslaved people had within the system meant to entrap 
them. Enslavers knew this too, which is why they place ads beyond their own re- 
gion, in places up North. 

That most digital humanities projects take place at higher education institu- 
tions is unsurprising. *Northern Carolina runaway slave notices" was funded 
through grants from the Institute of Museum and Library Services (IMLS) and 
North Carolina ECHO. *Freedom on the move" also was funded through grants 
from the National Endowment for Humanities, the National Archives and Cornell 
University. As it is a collaborative project between several institutions of higher 
education including University of Kentucky, University of Alabama, University of 
New Orleans and The Ohio State University, the labor of compiling, transcribing 
and maintaining the project is shared. One of the advantages of creating a na- 
tional database of runaway advertisements through collaborative effort that 
moves beyond simply collecting and transcribing, is the ability of such projects to 
capture a more complete narrative of an individual enslaved person on their 
quest for self-emancipation. Included might be the ability for these projects to 
trace runaway individuals over time to show multiple attempts to escape to free- 


10 Edward Baptist, William Block, et.al., Freedom on the move," Cornell University, 2017, accessed 
March 15, 2022, https://freedomonthemove.org/. 
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dom, the length of time of their escape, successes or recaptured, etc. A limiting 
factor of existing databases (some of which is due to a lack of available materials) 
is that most enslaved persons attempting for freedom are reduced to a single 
advertisement. 

Additionally, digital pedagogy does not necessarily happen when a project or 
scholarship is made digital. Projects like “Northern Carolina runaway slave noti- 
ces” and “Freedom on the move” are, at its current stage, digital repositories that 
link educational and informational networks through content exchange. They 
have yet to become fully realized platforms for how educators, students and 
scholars might use to facilitate digital learning. How might we build on and har- 
ness existing digital history projects and how might we depart from previous 
schools of knowledge? 

Windley’s close reading spurred scholars to think expansively about self- 
emancipated individuals. Over twenty years ago, David Waldenstreicher first ar- 
gued that *[rlunaway advertisements [. . .] were the first slave narratives — the 
first published stories about slaves and their seizure of freedom". In these texts, 
Waldenstriecher sees how an enslaved human being escaped and how their skill 
sets might enable their freedom quests. Though written by enslavers, these adver- 
tisements show “slaves’ capitalizing on the expectations of masters by contraven- 
ing these roles" (Waldenstreicher 1999: 247—248). Other scholars paid attention to 
what was not printed. For example, Marisa Fuentes's work reminds us that “the 
archive encompasses another space of domination" for enslaved women. She ar- 
gues for an examination of what is *emanating from the silences within the run- 
away ad", specifically how historical location and physical description reveals the 
vulnerability of self-emancipated women not explicitly addressed in the adver- 
tisements (Fuentes 2016: 15). Fuentes reminds us of what remains unwritten. We 
should consider the risk, trauma and potential violence when running away 
(Fuentes 2016: 42). 

Scholars also unite existing methods that show the complex ecosystem of en- 
slavement, in biography, theater and geography. Erica Armstrong Dunbar's study 
arose from a runaway slave ad from first U.S. president, George Washington. He 
sought Ona Judge, who self-emancipated while in Philadelphia, without any “suspi- 
cion [. . .] or provocation to do so” to use the advertisement's language. Dunbar 
uses contemporary correspondence and newspaper accounts to *reintroduce Ona 
Judge Staines, the Washingtons' runaway slave". Though we initially learn about 
her through an enslaver, Dunbar conducts a biographical recovery that shows the 
complexity of enslavement and the desire for freedom and agency (Dunbar 2017: 
xvii). Antonio T. Bly builds off Waldstreicher to read *advertisements for fugitive 
slave" as *complex living pictures or tableau vivants" that reveal *short vignettes" 
of courage, bravery, agency and co-authorship when enslaved individuals were de- 
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nied literacy. Bly uses the language of dramaturgy to illuminate the “concealed” 
narratives: “By running away, slaves compelled their masters to respond”(Bly 2021: 
240-241). Christy Hyman studies runaway slaves as a human geographer. She looks 
at where enslaved persons went to “build a spatial model of fugitivity” that centers 
“enslaved placemaking” rather than reliance on colonial mapping paradigms.” 
Hyman uses runaway slave ads to show how fugitives moved through “punitive 
landscapes" and facilitated or created new networks of geographic knowledge." 

The changing discourse in how historians examine slavery and acts of self- 
emancipation recognizes runaway slave advertisements as a mechanism of surveil- 
lance that monitored and traced not only the escaped enslaved, but their network 
of support as well. Enslavers often placed within the advertisements potential loca- 
tions where they presumed their property might have fled. It created a communica- 
tion network between enslavers, slave catchers, overseers, jailers and everyone 
vested in upholding the racial order of slavery to monitor, identify and recapture 
so-called “fugitive slaves". Yet, these geographical markers pointed towards family — 
wives, children, husbands, parents — and relatives in neighboring counties or ac- 
quaintances who might endeavor to help the enslaved person escape North. Their 
need to stay close to their loved ones confronted the forces that broke up the fam- 
ily unit in systems of slavery. This formed a basis for a new subjectivity reinforced 
through kinship and community support. Their ability to evade recapture for 
months, years and even forever attests to the strength of the network of care and 
mutual aid between enslaved communities (Johnson 2020). 


3 An interdisciplinary approach to mapping 
self-emancipation 


In following and mapping the routes of self-emancipation documented in run- 
away advertisements, our proposed digital history project seeks to reconstitute 
agency and refute geographical practices of colonialism in documenting the histo- 
ries and lives of those enslaved. Colonialism brings with it a fracturing force - 
most evident in the colonial practice that geographically breaks up parts of the 


11 Christy Hyman, *GIS and the mapping of enslaved movement: the matrix of risk." Environ- 
mental History Now, August 19, 2021, accessed March 25, 2022, https://envhistnow.com/2021/08/19/ 
gis-and-the-mapping-of-enslaved-movement-the-matrix-of-risk/. 

12 Christy Hyman, “The Disappearance of Eve and Sall: escaping slavery in North Carolina,” Black 
Perspectives African American Intellectual History Society, October 6, 2020, accessed June 28, 2023, 
https://www.aaihs.org/the-disappearance-of-eve-and-sall-escaping-slavery-in-north-carolina/. 
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world via color-coded maps and redlining practices in urban planning. American 
chattel slavery facilitated the segmentation of enslaved families through “being 
sold down river”, acts of self-emancipation, sexual violence and coercive repro- 
duction and death. Within the archives of the Federal Writers’ Project in the Li- 
brary of Congress sit thousands of records and accounts of forced separations 
used by enslavers as part of the mechanism of control and the multitudes of ways 
enslaved people have tried to mediate and transcend those circumstances. The 
Library of Congress is but one archive with the means and resources to preserve 
such stories from the margins. Runaway slave advertisements present another ar- 
chive to conduct the same work. 

Historian Elizabeth Maddock Dillion noted that authorship is interaction be- 
tween multiple roles. While she is referring to the structure of the archives that 
often reproduces colonialist narratives by prioritizing materials most prominent in 
the larger narratives of the state or national story, it is useful and necessary to dis- 
embed narratives literally placed at the margins — in footnotes, appendices or alto- 
gether unmentioned - out of the colonist context.” Antonio T. Bly similarly asserted 
that the silent actions of the enslaved in forcing their enslaver to respond to their 
escape, in refusing a name given by the enslaver, and in the identifiers they take on 
implies their co-authorship in the subtle subtext of the runaway advertisement (Bly 
2021: 246-247). 

Although runaway advertisements varied in length, they commonly featured 
the enslaved individual’s physical appearance, speech, manner of dress, behavior 
and, when applicable, trade skills — all attributes that were thought to easily identify 
him or her and would lead to their recapture. The narrative biographies offered 
within runaway advertisements enable a fuller understanding of these documents. 

One of the earliest digital history projects to specifically examine runaway 
advertisements was “Geography of Slavery in Virginia” from the University of Vir- 
ginia in collaboration with its Center of Technology and Education. Funded through 
grants from the National Endowment for the Humanities and Virginia Foundation 
for the Humanities, the digital project built on a HTML platform compiled not only a 
repository of runaway slave advertisements in Virginia, but also incorporated static 
maps of the routes of enslaved individuals and timelines. The project, built in 2005 
and intended to host collections of student projects on slavery in Virginia, is dated 
and has not been updated or maintained with intent to complete. 


13 OIEAHC Lecture Series, Elizabeth Maddock Dillon and Alanna Prince, “Decolonizing the ar- 
chive,” April 15, 2019. 

14 Tom Costa, “Geography of slavery in Virginia,” University of Virginia, 2005, accessed March 15, 
2022, http://www2.vcdh.virginia.edu/gos/. 
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*Geography of Slavery in Virginia," like many digital humanities works done 
as individual projects, emphasizes the microhistories of its subjects and usage of 
digital tools to extrapolate new interpretations and storytelling. It is hindered by 
lack of necessary labor and resources to carry it forward. This is a component of 
scaling digital history and humanities projects that cannot be overlooked. The 
technology available for the project limited the visualization of interpretations 
and how its microhistories might interact — such as each enslaved individual is 
mapped separately. Nevertheless, the project used a digital medium to consider 
the same questions and build on work previously done. It integrated digital tech- 
nology to examine alternative and overlooked sources. It facilitated digital learn- 
ing for a project of its time and technological period. 

As humanities scholar Patrik Svensson on the state and future of pedagogy in 
the humanities, the core values of *a predominantly textual orientation and a 
focus on technology as tools"? seems to ignore the interdisciplinary nature of dig- 
ital humanities work that moves beyond the fields of English and history. Luckily, 
new digital tools from Kepler.gl to ArcGIS are offering new ways of thinking 
about and doing history. Data visualization is emerging from textual analysis to 
provide nonlinear, nonconventional methods of reinterpreting existing datasets 
and archives. Different methods for interpretation offer new sets of questions we 
may not have thought of asking before. Data visualization shows ghosts within 
the archives by giving visual presence to missing and erased voices and that 
which cannot be quantified. Through mapping, it provides an added layer of spa- 
tial narrative for large datasets that might otherwise be ineligible for historians 
and scholars. 

Yet, digital technology cannot be treated simply as digital tools. They come 
with their own sets of limitations and possibilities. Scaling digital history requires 
context in both content and technological awareness; both should work in tan- 
dem to take a closer look with the ability to zoom out as needed. Our project pro- 
poses mapping the routes of enslaved people's attempts to self-emancipate to 
highlight and focus on individual narratives that can contextualize within the na- 
tional narrative on slavery. Well-documented stories of self-emancipation are 
those that have captured national attention as in the case of Margaret Garner, 
Henry *Box" Brown, and Elizabeth Freeman. Yet, more often they are compila- 
tions of partially told stories that, if fortunate, have been collected into archives 
and stitched together to create a more cohesive narrative. Most stories of self- 
emancipation, however, disappear. Rarely do we find or know the outcome of 


15 Patrik Svensson, “Envisioning the digital humanities," Digital Humanities Quarterly 61 (2012), 
accessed June 28, 2023, http://www.digitalhumanities.org/dhq/vol/6/1/000112/000112.html. 
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whether an individual has made it to freedom or has been recaptured and re- 
turned to slavery. How do we continue these narratives when so much is missing? 
More importantly, how do we fill in the missing pieces of the story without speak- 
ing for those enslaved whose stories remain perpetually co-opted and revised?!? 

The propensity of scholarship to sideline the microhistories of enslaved peo- 
ple due to the impasse in evidentiary gaps — missing and unfounded data in the 
archives — might be mitigated by layering and juxtapositioning with existing and 
available environmental, ecological and geographic data. This laying of temporal, 
spatial and historical data utilizes the concept of thick mapping (also called deep 
mapping) in the digital humanities to construct and reconstruct fragmented social 
and cultural moments (Presner et al. 2014: 15-18). Hard data (climate, demo- 
graphics, geolocations) layered with soft data (images, events, oral histories) cre- 
ates a more cohesive intertextual reconstruction of fragmented and missing 
narratives. 

This evidentiary parsing of the past requires an interdisciplinary approach to 
reconstructing individual narratives of seeking self-emancipation and necessitate 
the incorporation of other forms of data. The materiality of mapping, when made 
digital, allows more networked relationships between “multiplicity of layered 
narratives, sources, and even representational practices” (Presner et al. 2014: 17). 
If we view runaway advertisements as a form of micro-maps, revealing the desti- 
nations and projected routes enslaved people took, other establishing factors be- 
come evident as new geographic markers within the narrative: 

1. Sites of transgression such as plantations, farmhouses, towns and cities, 
churches, commemoration events, etc. 

2. Sites of surveillance such as mills, markets, general stores, jails, etc. 

3. Sites of communications such as taverns and inns, waterways, ports, printing 
offices, etc. 

Sites of punishments such as jail stocks, auction blocks, plantations, etc. 

5. Sites of safety and sites of danger such as safehouses on the Underground Rail- 
road, homes and communities freed Blacks, woodlands and swamps, weather 
conditions, presence of wild animals, etc. 


16 The aforementioned Runaway Project’s Twitter Account utilizes artificial intelligence to generate 
tweets based on runaway advertisements in a series of tweets called “Tweets_from_runaway_slaves.” 
An algorithm is used to extract information from each inputted advertisement that are then restruc- 
tured and reworded into first-person accounts from the enslaved individual-using artificial intelli- 
gence to literally speak for enslaved people from information written by their enslaver. See Runaway 
Project 2020 Twitter feed, “Tweets_from_runaway_slaves,” accessed June 28, 2023, https://twitter.com/ 
FromSlaves. 
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Returning to George and Phebe’s story, we learn more about their escape from 
bondage by examining environmental details from early July 1826. Many come 
from previously digitized maps, newspapers and personal correspondence. From 
one 1827 map, we can list the many counties crossed: Orange, Madison, Culpeper 
and Facquier. This same map shows the industrial and natural topography: busy 
turnpikes, rushing waterways and active mills along George and Phebe’s purported 
flight along the edge of the Blue Ridge Mountain system (Boye et al. 1827). The 
natural landscape paradoxically offered protection while camouflaging hazards. 
An encounter with domesticated and/or wild animals could mean further danger 
(Silkenat 2022). From newspaper accounts, we know it was rainy across the re- 
gion.? From the July 8, 1826 advertisement, we know George and Phebe left on 
“Monday night last" — or Monday, July 3. This would have been the night before the 
50th anniversary celebrations for the signing of the Declaration of Independence. It 
was also the night before Thomas Jefferson, author of the Declaration, died at his 
home approximately two miles from Highland. Did George and Phebe also know 
that the old friend and mentor of their enslaver, James Monroe, was on his death- 
bed? Did this impending death preoccupy the Highland household, in addition to 
the *unfavourable weather," offer the perfect cover for the couple to escape to- 
gether? Uniting these factors via digitized primary sources allows for a nuanced 
read of the runaway slave advertisement. In this way, the ad almost becomes a 
map of a moment. It is a microhistory where commemoration, weather, and mor- 
tality link up. Under this situation, conditions may have been as good as they 
would ever get for George and Phebe to escape together. What if we could replicate 
this layering of increasingly available sources for other runaway slave advertise- 
ments, thus creating fuller depictions of enslaved agency and interconnected his- 
torical processes? 

George and Phebe's story, in many ways, are not unique. Thousands of en- 
slaved people have endeavored a path to freedom. Most of these stories are lost, 
erased and fragmented within the historical archives. Often what remains are de- 
scribed singularly within a short runaway advertisement. An interdisciplinary ap- 
proach to creating micro-maps from each advertisement can certainly function as 
a way to scale history and combat the fragmentation of evidence to bring these 
individual's stories back into our collective history and memory. Throughout our 
collaboration, we discussed scale and scalability in a variety of modes. First, we 
looked at fugitive slave advertisements for geographic or spatial mode: where 


17 See also Map of Virginia, accessed June 28, 2023, https://www.loc.gov/item/2012589665/. 
18 "The Jubilee," The Richmond Enquirer July 11, 1826; *Chesterfield celebration," The Richmond 
Enquirer, July 18, 1826. 
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were enslaved human beings headed, and where were they coming from? This is 
often the first level of interpretation within the humanities and humanistic social 
sciences. Second, we continued with the temporal: when did these advertisements 
get placed? The “where” and “when” were both determined by our available ar- 
chive, the Virginia Gazette, as well as our primary site of interest. 

From concrete modes, we moved into the emotive or subjective realms of 
scale. With careful close reading, these advertisements connote a relational or emo- 
tional scale, as they are vignettes of desperate escape and feared re-enslavement. 
Within this emotive space, we also examined community and family within each 
text, as evidenced in the above example of George and Phebe: couples running 
away together, conceivably into networks of care and assistance. Finally, perhaps 
most obviously, we looked at these advertisements as relational, between enslaver 
and enslaved. These advertisements demarcated individuals by skin color, dress, 
skillset and kinship. There is a form to these texts. They indicate the system of en- 
slavement as a process where certain human beings were denied their own bodies, 
their own systems of knowledge and they could be ruthlessly and violently appre- 
hended to maintain the free/unfree binary that undergirded all aspects of life in 
the United States. 

In addition to these modes of historical scale, we also considered how digital 
projects introduce additional scalability, primarily through access and funding. 
This is particularly tied to institutional support and/or cross-institutional collabo- 
ration. When reviewing projects, we considered their relative impact and influ- 
ence: what audiences were they reaching and how easy is their interface to 
navigate? Much of this accessibility is contingent on funding mechanisms that 
sustain both the interface and the content. Access, in particular, has become a 
fundamental pedagogical tool amid an ongoing pandemic where virtual program- 
ming has become central to spaces of instruction, including school classrooms 
and museum spaces. 


4 Making histories public 


Our proposed project translates the aforementioned academic history into public 
history — scaling content for a broader audience. Public history does not water 
down content. Rather, it shifts according to its audience. However, this does not 
make public history an equitable space of knowledge production; it too must 
reckon with how its own disciplinary history in the United States centered on 
whiteness for over a century (Meringolo 2012: iiv). Under this rubric, histories 
were untold. If artifacts were on display, they were often linked to white histori- 
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cal personages or elevated for their “physical qualities” (Meringolo 2012: 72). 
Without observable presence in archives, many historical narratives disappeared 
into their respective oral traditions. Sites of public history were constructed to 
privilege white audiences’ understanding of the United States as a superior and 
innocent global entity. By ignoring Thomas Jefferson’s rape of Sally Hemings or 
Andrew Jackson’s slaughter of Native Americans, public history absolved the 
Founding Generation and sanctified the abstract American values of liberty, jus- 
tice and equality as inalienable rights for white men only.? 

Through the efforts of historians, anthropologists, archivists and community 
activists, the white supremacist monolith of public history now shows cracks — 
often most visible in online venues. How does public history relate to digital his- 
tory? Ideally both fields bring audiences into private and public spaces — trans- 
forming the idea of site visitation — to encourage cross-institutional projects, 
provide models for future projects and *to democratize history: to incorporate mul- 
tiple voices, reach diverse audiences, and encourage popular participation in pre- 
senting and preserving the past”.”° Public history and digital history migrate across 
multiple mediums, and with varying results. Trial-and-error operates in both fields 
too, and some interpretative devices work better than others. Digital public history 
can help undo entrenched and colonialist orderings of the past. Uniting history 
done beyond academia with available and accessible technologies helps to chal- 
lenge colonial epistemologies and share new interpretative frameworks to privilege 
the historically marginalized.” Most importantly, digital history shows that revision 
is a part of any good historical process, public or otherwise. The digital might never 
be done; it just needs to go live with corrections and updates welcome — encour- 
aged, even. 

We ground our proposed project in the realm of digital public history. We do 
not want to reinscribe racist, colonialist, normative reads of these advertisements 
but use them to display new networks of agency, resistance, migration and com- 
munity (Risam 2018: 4). Despite our framework, rooted in theoretical approaches, 
we have deliberately limited our own use of academic jargon to create an accessi- 
ble, relevant and readable body of literature. This is known history and we are 
privileging another perspective over more traditionally, well-documented ones, 
such as those of enslavers. By examining runaway slave advertisements from a 
specific time period and a specific place, we also shed light on how the system of 


19 By American, we specifically mean the United States of America. 

20 “Our story," Roy Rosenzweig Center for History and New Media, accessed May 8, 2022, https:// 
rrchnm.org/our-story/. 

21 *Digital history & historiography" Luxembourg Centre for Contemporary and Digital History, 
accessed May 8, 2022, https://www.c2dh.uni.lu/research-areas/digital-history-historiography. 
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chattel slavery in the United States reverberated across time and space into our 
current moment. The implications are tremendous: displaced systems of kinship 
and community knowledge; generational trauma through family separation; and 
the mechanisms by which people found freedom included new names and new 
backgrounds. 


5 Conclusion 


For us, the question is clear: how can technology reveal history’s tableau vivants, 
to return to Antonio Bly’s work - the “exploded moments of life” buried within 
historical documents? The methodological and pedagogical goals of any digital 
humanities work is to avoid replicating colonialist practices and superstructure 
of these projects. The danger lies in treating stories as data, which further dehu- 
manizes and delegitimizes the potential of digital history to make positive change. 
We need to be aware of the cultural history of the technology we are using. The 
project has a history that must be acknowledged in order to be undone. 

Let us return to the opening example of George and Phebe. Often the scaling of 
history necessitates the movement across multiple platforms and media. Through 
the printed advertisement, we see their biography unfold despite their enslave- 
ment. We digitally map their relationship, their community centers, their environ- 
ments, their trajectory of self-emancipation. How powerful would it be to use 
available and accessible technology to place George and Phebe on the landscape of 
early nineteenth century Virginia? It would show enslaved human experience be- 
yond the plantation boundary. How might this boundary-breaking behavior in 1826 
Virginia also break scholarly boundaries of historical interpretation in 2022? 

In doing digital history, we cannot lose sight of the humanity found in runaway 
slave advertisements. These were individuals who endured unfathomable violence 
and trauma. What epistemological and methodological approach might we engage 
with in the digital realm to acknowledge the dignity of enslaved persons? In this, 
we look to the landing page for (Un)Silencing Slavery: Remembering the Enslaved at 
Rose Hill Plantation, Jamaica: *The Purpose of the (Un)Silencing Slavery at Rose 
Hall Project is to respectfully and lovingly remember and hold space for the en- 
slaved Africans and their enslaved African-born and Caribbean-born descendants 
who lived and labored at Rose Hall Plantation in Jamaica". Before you can explore 
names, dates and documents, you must confront this statement centered on your 
computer screen. (Un)Silencing Slavery is “a memorial [. . .] a site of mourning and 
grieving [. . .] a gesture of gratitude and appreciation, and [. . .] a catalyst for the 
ongoing recognition, exploration, and presentation of the enslaved persons of Afri- 
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can descent at Rose Hall”.2 If we acknowledge the dignity of enslaved persons - 
and not their potential as data points — we move towards more respectful and 
transformative digital humanities scholarship. 

Where most accounts of George and Phebe began and ended with a single 
advertisement placed by their enslaver, we reassert their agency and voice by 
stressing their conscious movement. By re-scaling focus on their personal jour- 
ney, revealed through the powerful act of self-emancipation, we can see their 
human forms more clearly. 
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Keep Calm and Stay Focused: 
Historicising and Intertwining Scales 
and Temporalities of Online Virality 


Abstract: After explaining why spatialities and temporalities, as well as platforms, 
matter in the historicization of virality, this chapter takes the Harlem Shake as a 
case study to demonstrate how a scalable and medium reading may allow to recon- 
struct past virality. It discusses the challenges of flow and circulations of memes, 
the sources and tools that may be used to renew an approach that may benefit 
from a contextualization, which is not only dedicated to content and to a semiotic 
approach, but also to the containers, the communities and stakeholders involved in 
these flexible and deeply adaptable phenomena. By intertwining sources from the 
live and archived web, by crossing corpora, based on the press, the archived web 
and social networks (Twitter in this case), the authors enlighten the complexity at 
stake in the reconstruction of past virality. 


Keywords: memes, virality, Harlem Shake, scalable reading, medium reading 


The history of the Internet and the Web was boosted by leaving behind a rather 
US-centric vision (Russell 2017), in favor of analyses that shed light on more tenu- 
ous aspects, missing narratives (Campbell-Kelly and Garcia-Swartz 2013) and na- 
tional appropriations (Schafer 2018; Siles 2012; Goggin and McLelland 2017). The 
history of digital cultures also benefited of this change in scale. It favored studies 
that were more focused on specific domains, be they national, dedicated to some 
stakeholders such as early adopters (Paloque-Bergës 2017), fan communities (Hor- 
binski 2018) or previous communication networks (Driscoll 2022). The study of so- 
called “Internet phenomena”, such as memes, also requires this type of approach: 
they must be anchored both in a general digital movement, supported by techni- 
cal platforms that allow the international, near-instant circulation of content — a 
movement linked to the more general phenomena of the circulation of globalized 
information - and in an approach that fully incorporates the spatial and temporal 
variations of their circulations. This is necessary when it comes to memes, as cir- 
culation, flow, appropriation are inherent in their definition: based on the genetic 
metaphor created by Dawkins (1976), the notion relies on the idea of cultural 
units that circulate and are transformed. Memetics is actually a theory based on a 
dissemination model for cultural elements that are able to replicate themselves 
and spread within a cultural space. However, the deeply adaptable, flexible and 


8 Open Access. © 2024 the author(s), published by De Gruyter. LGS This work is licensed under the 
Creative Commons Attribution-NonCommercial 4.0 International License. 
https://doi.org/10.1515/9783111317779-006 
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shifting character of memes makes sense in context and this context has to be 
reconstructed as it can't remain a blind spot: the semiotic transformations of 
memes are constituted within social groups and follow a logic that touches on the 
very history of the relationships between these groups (Milner 2018). 

This chapter thus takes online virality as its starting point, specifically that of 
memes and the question of their historicization,! to question how they may be 
resituated in a context of production and circulation, taking social, spatial and 
temporal logics into account. To do so, a multi-scalar reading must be applied to 
them and full advantage must be taken of both “scalable reading” and “medium 
reading” (Schafer 2019). Indeed, the intent “to move “gracefully between micro 
and macro without losing sight of either one’ in a collection, to see ‘patterns’ and 
*outliers', to zoom in and zoom out and to understand what ‘makes a work dis- 
tinctive’ within a ‘very large context” (Flanders and Jockers 2013: 17, 30) is key. 
However, we will show that the question of scalability is as much about con- 
stantly zooming in and out, switching between distant (Moretti 2007) and close 
reading, as it is about a cross-functional, multiplatform, media-based reading: one 
must also consider the infrastructures and players enabling its distribution/circu- 
lation, as well as the heritagization/preservation of these Internet phenomena. 
This implies complementary levels of reading, which can't focus solely on content, 
but also have to dig into the conditions of production, circulation, preservation 
and even on access to data (Dobson 2019). After a discussion of the scalable read- 
ing as applied to virality, with specific emphasis on spatial and temporal aspects, 
we will propose a case study based on the Harlem Shake. 


1 Temporal and spatial challenges 
of online virality 


Several disciplines have shown a particular interest in the study of virality, and no- 
tably of memes, be it semiotics, communication studies or media studies (Shifman 
2014; Milner 2018 and many others). However, these studies often focus on one plat- 
form (YouTube is largely at the heart of Shifman's pioneering work), or on their 
meaning, be it political (Denisova 2019) or semiotic (Cannizzaro 2016; Wagener 
2020). A historical approach, situated and contextualized over the short lifespan of 


1 The historicization of online virality is at the heart of the Hivi research project, *A history of 
online virality", that we are conducting at the University of Luxembourg with the support of the 
FNR (C20/SC/14758148). See hivi.uni.lu. 
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the web's history — but one that is dense, given the extremely fleeting and rapid 
virality of certain phenomena, is less often considered. Yet both the spatial and 
temporal dimensions - to which scalable reading can be applied — are essential. 
These dimensions are at the heart of all viral phenomena at many levels: at the 
origin of the phenomena, their circulation, resurgence and posterity. 


1.1 Temporality, spreadability and trends 


The temporal dimension is evident in the case of the oldest Internet phenomena, 
whether we think of Godwin's Law (which spread from the first half of the 1990s), 
the Dancing Baby (1996) or the Hampster Dance (1997). These three examples, 
among others, are admittedly different in nature. The first phenomenon, which 
circulated in the early 1990s in Newsgroups, initially in text form, states that *as 
an online discussion grows longer, the probability of a comparison involving Hit- 
ler approaches 1";? the second owes its success to its 3D realization, which was 
remarkable for the time; the third to the catchy nature of the Disney music and 
the animation created in 1998 by a Canadian student, whose website rapidly 
gained in popularity. Their origin is well documented in the press titles, along 
with on platforms such as Know Your Meme (KYM) or Wikipedia. There is a vari- 
ety of actors, from Deidre Lacarte's individual composition of hamster GIFS on 
Geocities to the more complex journey of the Dancing Baby, which was created in 
1996 by Michael Girard and Robert Lurye, with a program designed by Character 
Studio and used with 3D Studio Max and a Microsoft computer. The popularity of 
the Dancing Baby benefited then from its format's adaptation and circulation 
through e-mail by Ron Lussier, who worked for LucasArts, and, from 1996 too, 
from its transformation by John Woodell, a software engineer who created a GIF 
(Pereira 2022). 

These pioneering phenomena are thus deeply rooted in a technical context 
Oe, development of gifs and animated images) and in uses (BBS, emails, popular- 
ity of Geocities, etc.). It is difficult to get an idea of their real circulation in this 
first age of the Web beyond the indirect traces that allow us to grasp their influ- 
ence. Some phenomena remain elusive and are not preserved, such as the chain 
emails that existed even before virality developed on the Web. However, when 
Mike Godwin published an article in Wired in 1994 referring to his law, there was 
evidence of a growing popularity, which the publication of the article could only 


2 See Mike Godwin, *Meme, Counter-meme", Wired, 10 January 1994. 
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reinforce. The Dancing Baby, that has become a symbol of the vernacular Web, ap- 
peared in the series Ally McBeal and was thus further opened up to the general 
public, while Lussier began to wonder how to deal with the re-uses of his creation 
in the face of the Dancing Baby's success (McGrath 2019: 511). The Hampster Dance 
and its animated GIFs (Eppink 2014) gave rise to commercial derivatives Oe, “The 
Hampsterdance Song", which was produced by the Boomtang Boys and released 
on July 4, 2000). Such elements help to measure the popularity of this kind of con- 
tent. A distant reading applied to Geocities would also undoubtedly enlighten some 
concrete uses of the Dancing Baby or hamsters in online pages, as Ian Milligan 
(2017) did for instance, for the bounding tiger from the Enchanted Forest commu- 
nity. Milligan's study is based on archived Geocities pages. This study is only possi- 
ble thanks to image extractions on a determined, circumscribed corpus, when for 
example the dancing baby circulated via email and on very different websites. 

Whether it is these pioneering memes or other Internet phenomena that span 
time, they are often presented in the form of detailed chronologies, either in the 
KYM platform or by Wikipedia.’ This is the case of the Rickroll, a famous prank 
inspired by an ancestor, the Duckroll, which first appeared in 2007. The Rickroll 
involves inviting online users to click on a hypertext link by proposing attractive 
content, in order to direct them to the song (and often the video) Never Gonna Give 
You Up by Rick Astley, released in 1987. The White House and the Anonymous got 
on board with the Rickroll? making it more popular. These demonstrations hap- 
pened sometimes in the public space: in 2014, the group Foo Fighters “rickrolled” 
an anti-gay demonstration organized by the Westboro Baptist Church. These cases 
allow us to monitor the visibility of viral phenomena, but remain focused on 
*major" events, as it is difficult to grasp some more discreet uses. 

Though we can ascertain success by the number of views regularly men- 
tioned in KYM and Wikipedia or by their circulation in traditional culture (song, 


3 Using Images to Gain Insight into Web Archives, accessed July 11, 2023, https://ianmilli.word 
press.com/2014/08/11/using-images-to-gain-insight-into-web-archives/. 
4 The reason why these two rather different platforms are mentioned in parallel is that both have 
chosen an encyclopaedic style, a collaborative model and are interested in Internet phenomena. 
5 In 2011, the White House Twitter account responded to an online user by referring him to the 
Rick Astley clip. Praetorius Dean, “The White House ‘rickrolls’ its Twitter Followers", huffpost.com, 
27 July 2011, accessed July 11, 2023, https://www.huffpost.com/entry/white-house-rick-roll-twitter n . 
911345. 

In 2015, after the terrorist attacks in France, Anonymous flooded pro-ISIS accounts with rick- 
rolls to disrupt DAESH communications on digital social networks. 
6 KYM mentions for the Hampster Dance: *LaCarte told the webzine that over the course of 4 
days in March 1999, the site acquired nearly 60,000 new hits. Three months later, it broke 
17 million views". Accessed July 11, 2023, https://knowyourmeme.com/memes/hampster-dance. 
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series), it is difficult to accurately measure past viewers. Wikishark or Google 
trends may provide some general information. 

Wikishark (Vardi et al. 2021) measures the number of views of a Wikipedia 
page, as in the case of the Dancing Baby on English-language Wikipedia since 
2008 (the Wikipedia page dedicated to the phenomenon was created in 2006). 
More recent visits in May 2022 (Figure 1) are probably related to announcements 
that HFA-Studio plans to release a digitally restored, high definition 1/1 artwork by 
the original creators as NFT.” Google trends also deliver results, but these contain 
noise. A Dancing Baby search will thus refer to a U.S. dance show. The refined 
search *Dancing Baby gif" provides some conclusive results, while also referring to 
the film Dirty Dancing whose female character is nicknamed Baby (not to mention 
the fact that Google trends remain a reading tool focused on Google consultations, 
which obviously leave out other search engines and audience’). 

These trends, which have obvious biases (Google consultations, Wikipedia in En- 
glish, etc.), remains indicative and they must be refined by cross-referencing sources. 


1.2 Spatiality, transmediality and contextualization 


In addition to the temporal aspects of virality, its spatial aspects are also key: for 
example, Matt Furie's famous creature, Pepe the Frog, was appropriated by the 
U.S. far-right under Trump, and at the same time used to denounce police violence 
in Hong Kong in 2019 (Pettis 2021). It is therefore decisive to grasp the areas of 
deployment of a viral culture: presented as globalized, it has also national, local or 
even community variations, as well as circulation across several digital spaces. 
Looking first at the circulation between platforms and media, there are many 
examples of complex circulation. This is for instance illustrated by the Distracted 
Boyfriend meme, a stock photo made famous from 2017. It initially appeared on In- 
stagram, inviting users to “Tag that friend who falls in love every month" (Esposito 
2017). A few months later its success was assured by its use on Facebook targeting 
Phil Collins turning lustfully towards *pop", embodied by the young woman in the 
red dress (recalling the strong links between pop culture and memetics). On Twitter 
the image macro was associated with comments for each character, gradually in- 
creasing in abstraction. In addition to the fact that three platforms assured the suc- 


7 See https://edition.cnn.com/style/article/dancing-baby-meme-nft/index.html, accessed July 11, 
2023 and https://www.macobserver.com/news/longtime-internet-meme-the-dancing-baby-to-be- 
minted-as-nft/, accessed July 11, 2023. 

8 Search for “dancing baby gif" in Google trends: https://trends.google.com/trends/explore?date= 
all&kgeo-US&q-dancing9620baby9620gif. 


Fred Pailler and Valérie Schafer 


124 


`Áqeg Bupueq eui 104 8007 WOA} sMƏIAəƏBed (ua) eıpadıyım :L eanBi4 


[ua}Aqeq Gupueg 


smataabeg eıpadıyım 


Keep Calm and Stay Focused — 125 


cess of this famous meme, the variations in meaning that appeared over the course 
of these circulations were also worth tracing, along with changes in audience that 
did not necessarily overlap. As such, it is essential to look at the platforms that en- 
sure the circulation and transformation of memes. The logic of such platforms is 
not neutral, either in terms of curation, economic model or uses, or in terms of cir- 
culation and affect. Some more specific networks, in particular famous forums 
such as Reddit and Achan, stand out since they often play a role in the early days of 
memes. For instance, a search in KYM of cat celebrities distinguishes spaces of crea- 
tion and dissemination of these cats and their entry into culture at large, as shown 
by the extraction of 1,044 entries devoted to famous internet cats and their analysis 
via Gargantext.? 
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Figure 2: Extract 1 of a visualisation of cat celebrities in Know Your Meme through Gargantext. 


9 This study was conducted in March 2022 with the support of Quentin Lobbé (Institut des sys- 
témes complexes, Paris). We sincerely thank him. 
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Figure 3: Extract 2 of a visualisation of cat celebrities in Know Your Meme through Gargantext. 


Distant reading as applied to these 1,044 entries, which range from Felix the Cat 
and Schrodinger’s Cat to Nyan Cat and Grumpy Cat, clearly shows the importance 
of the formats (macro image for Grumpy Cat, see Figure 2) and the previously 
mentioned platforms (e.g., 4chan, Reddit or Cheezburger for cats, see Figure 3), on 
the first steps towards success and virality. 

Since the role of these platforms is undeniable in the creation and visibility of 
Internet phenomena, it invites us to closely look at the medium that ensures their 
dissemination, or even the media at large, since all these phenomena are cross- 
platform. Thus, the Rickroll mentioned above circulated on Twitter, on Vine, before 
the platform closed in 2019, on forums (specifically video games, while the Rick- 
roll of a Grand Theft Auto IV demonstration was the starting point of its success), 
and so on. Grasping all these occurrences is difficult, even when relying on web 
archives, for example those of the INA (French audio-visual institute). 

Once again, the researchers must be fully aware of the limits of their research 
and of the results and the distant reading offered through the INA's interface (Fig- 
ures 4 and 5): on the one hand, INA is limited in France to archiving audio-visual 


10 “Vine Gets RickRolled: 16-Year-Old Developer Hacks App To Upload Full Rick Astley Music 
Video", International Business Times, 4 June 2013. 
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content, while since 2006 the BnF (National Library of France) archives “the rest" 
of the French “national Web". Thus, the Twitter accounts followed and archived by 
INA are essentially linked to the audio-visual sector (journalists' and channels' ac- 
counts). They do not represent the entire Twitter sphere, though the discussion 
threads do, of course, broaden the content. Moreover, the results contain noise (for 
example, a search for Numa Guy will yield results concerning not just the famous 
Lip Sync cover of the O-zone song, but also excerpts from interviews with a French 
researcher named Guy Numa). Above all, web archives are not intended to be ex- 
haustive, rather representative (Brügger, 2018). In addition, virality spreads mostly 
on several digital social networks and we must often face their lesser archiving 
(Facebook) or total lack of archiving (Periscope, TikTok until recently). There are 
also issues of searchability in the web archives: virality is often difficult to name 
and find, as it is not necessarily based on a specific website or a single search term. 
Our BUZZ-F project, related to French virality and conducted jointly with the BnF 
Datalab over the academic year 2021-2022," highlights these searchability difficul- 
ties in collections that are not yet fully indexed in plain text. There is a need for 
developing strategies to find traces of a viral phenomenon via, for example, the se- 
lection of URLs explicitly containing terms associated with it (Figure 6). 

The experiment, conclusive on lip dub, a lip-sync phenomenon, nonetheless 
raises genuine methodological precautions. First, the lip dubs identified in the 
URLs are not representative of all mentions of lip dub which can be found on 
sites with another URL and in more generic pages. Second, lip dub, whose popu- 
larity was of particular note in 2011, is archived by the BnF within its methods 
and bounds at the time: many sites ending in .org, .com and blogs were not col- 
lected within BnF's 2011 collections, while lip dub is very much linked to uses in 
the corporate world. In addition, *the budget in terms of number of URLs defined 
per domain at the time of collection may have proved insufficient to allow archiv- 
ing of all the content of a given site," noted Antoine de Sacy and Alexandre Faye 
during our collective work at BnF. 

The above examples therefore push us to consider viral phenomena in terms 
of platforms and their conditions of selection, preservation, production, archiving 
and circulation, in addition to spatial and geographical contextualization. In his 
“If It Doesn't Spread, It's Dead" series of texts, Henry Jenkins (2009) developed the 
concept of spreadable medias. He constructed a model based on the convergence 
of media, the cross-media dissemination of content, and voluntary dissemination. 
These media must be placed at the center of the historicization of virality. In this 
sense, a scalable reading that restricts itself to a zoom effect is insufficient, as the 


11 See https://bnf.hypotheses.org/19155, accessed July 11, 2023. 
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Figure 6: Percentage distribution of lip dub in domain names (on the top 10 sites) in BnF web archives 
(a visualization made within the BUZZ-F project by Antoine de Sacy and the BnF Datalab). © BnF. 


medium (and its technical and socio-cultural conditions) must itself be constantly 
considered to conduct digital source criticism (Föhr 2017). Infrastructure studies 
and platform studies add an essential dimension to a semiotic, cultural or com- 
municational study. They invite us to consider a “medium reading” in addition to 
a scalable reading, i.e. a reading via the (cross-)media context, and as much via 
the content as by its container. In the following section we will use the Harlem 
Shake as a case study to illustrate this approach. 


2 “Keep calm and do the Harlem shake” 


The Harlem Shake is a dance video that follows several compositional rules and 
was often uploaded to YouTube. It was a viral phenomenon in early 2013, combin- 
ing characteristics such as the global, cross-cultural dimension with cross-platform 
circulation of an ephemeral nature. The Harlem Shake, which is in some ways a 
“mise en abyme” of viral contagion (Marino 2014) serves here as a salient example, 
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though its characteristics do not correspond to all Internet phenomena. Indeed, 
some memes can circulate intensely on a single platform or in a single country. 
Some others such as the Rickroll can, unlike the Harlem Shake, be less spectacular 
in terms of audience and spread, but more consistent over time. However, the ap- 
proach we conducted on the Harlem Shake through the BUZZ-F project with the 
support of the BnF datalab and through the Hivi project (i.e., scalable and medium 
reading, cross-analysis of live web, press and web archives and notably those of 
social networks) are applicable to other Internet phenomena. 


2.1 A metaphor of viral content and memes 


On 30 January 2013, George Kusunoki Miller,” an Australian-Japanese student liv- 
ing in the United States, posted a video of himself dancing in his room to the 
music by DJ Baauer, accompanied by three friends dressed in zentai costumes.” 
On 2 February 2013, an Australian skateboarder collective (The SunnyCoastSkate) 
responded to this publication by imitating the video'* and added a two-step nar- 
rative construction: a helmeted character dances alone in an environment where 
other people are absorbed in routine tasks. Then, when the music drops, every- 
one suddenly finds themselves dressed up and swept into a bumpy trance. As the 
video went viral during February 2013, it was remixed and replayed by thousands 
of people.” They kept the narrative structure and the costumes, while adding 
new recurring elements to the Zentai suits, such as men in underwear mimicking 
sexual acts, a rubber horse mask and so on. Remaining faithful to the *original 
recipe" and to the *Harlem shake" title is central to the readability and identifica- 
tion of the remixes, and the absence of linguistic elements allows for easy dissem- 
ination across several geographical areas. 

At the same time, major players in the sports and cultural industries participated 
in the massive dissemination of the initial videos. The viral nature of the Harlem 
Shake did not come first from a horizontal movement, from one individual to an- 
other, but instead from major relays that amplified the visibility of original content 


12 Miller has long been active on YouTube, where he has a channel featuring various characters, 
filthy frank or pink guy (with the pink zentai costume seen in the clip). The viral nature of his 
video made his channel famous and launched his career as a singer-songwriter. 

13 Miller's original video, accessed July 11, 2023, https://www.youtube.com/watch?v-8v]iSSAMNWw. 
14 The *Harlem Shake v.1" video that set the standard for remixes, accessed July 11, 2023, https:// 
www.youtube.com/watch?v-384IUU43bfQ. 

15 An example of a Harlem Shake compilation, accessed July 11, 2023, https://www.youtube.com/ 
watch?v-X6GSVYL6rwo. 
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posts (Ashton 2013). For example, on 8 February, the Original Skateboards company 
performed a Harlem Shake, making the most of the free publicity generated by a 
poster of the brand hanging on the wall in the young Australian skaters’ video.'® 
Much larger players became involved in the promotion of the original videos. The 
various members of Baauer’s label, Mad Decent, mentioned the track produced by 
the DJ on Twitter. The label’s YouTube channel redirected to the original videos 
which included Miller's, the Australian skaters’, and a few others, thus offering a 
higher visibility to their respective accounts. This strategy was also based on the fact 
that YouTube’s “Content ID service” made it possible to monetize Baauer’s track, 
both for the version produced in 2012 as published on the label’s channel and for the 
content uploaded by others, i.e. thousands of videos that used the same soundtrack 
(Soha and McDowell 2016). The Harlem Shake therefore strongly benefited from the 
work and investments of various players in the cultural industries. There was a con- 
vergence of multiple and diverse interests, linked to the possibility of publishing con- 
tent individually and of the management of copyright by the video platform. 

In addition, Miller's Harlem Shake and its thousands of remixes are largely 
part of a tradition of collective performance videos, whether sung, danced or 
both. The previous summer, 2012, saw the release of the “Gangnam Style" music 
video by Korean pop star Psy. The latter reached 1 billion views on YouTube in a 
matter of six months (the Harlem Shake reached the same figure in two months). 
Psy’s clip presents a dance routine (“ride the pony") that is fun and easy to copy. 
In this way, it gave rise to gigantic flash mobs, such as the one at Le Trocadéro in 
Paris, organized by Universal Music and the French radio station NRJ, which 
brought together some 20,000 people." Even earlier, other viral phenomena in- 
volved singing and/or dancing performances (for example in TV talent and reality 
shows, see Hill 2014) and giant flash mobs, such as the one at the start of Oprah 
Winfrey's show in Chicago in 2010 by the Black Eyed Peas. We may also mention 
lipdubs by political parties, companies or supermarkets, from the middle of the 
2000s onwards, as well as reprises of dance routines by dance schools or by pu- 
pils in school or university playgrounds. The Harlem Shake comes from a broader 
chronology of dance or even collective mayhem: it stands at the crossroads be- 
tween commercial strategies and previous media and public practices, and reas- 
serted as part of the dissemination and monetization of Facebook. 


16 The Original Skateboards video, accessed July 11, 2023, https://www.youtube.com/watch?v=- 
unlOs_Yt3w. 

17 https://web.archive.org/web/20121108041729/http://www.france24.com/en/20121105-psy-draws- 
thousands-gangnam-style-paris-flashmob. 
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The Harlem Shake was also performed in North Africa and the Middle East 
during March 2013. Groups of students and high school pupils filmed the dance in 
turn. The fact that this happened in educational establishments or in public 
spaces took on an immediate political meaning. Indeed, some of the dancers were 
severely punished (Hawkins 2014). 

The traditional media echoed these political tensions. They also played a 
major role in the dissemination of the phenomenon. Television and the press 
have regularly highlighted the videos uploaded by their listeners or readers. 
Moreover, regional and local press played a role in both reporting on the global 
phenomenon and documenting its local consequences. 


2.2 Scalable reading of a *glocal" phenomenon through 
press and web archives 


The press is an interesting source when examining the local importance of the 
Harlem Shake in addition to the global nature of its circulation. It can be analyzed 
thanks to digitized press databases such as Europresse and Factiva, but also cou- 
pled with online press web archives, which for France are preserved at the BnF. 
We also supplemented this media corpus with web archives kept by INA (the 
French Audio-visual Institute). 

In a bilingual French-English corpus of international, national, regional and 
local press, retrieved from Europresse (which combines print and web editions),'® 
the virality is seen to peak at the beginning of 2013. It then fell throughout 
the year, until the anniversary of the phenomenon in early 2014. In the following 
years (Figure 7a), articles appear sporadically on the subject. In fact, these articles 
tend to evoke the Harlem Shake as a paradigm of the viral phenomenon related 
to YouTube and social platforms in general.” 

The Figure 7b focuses on the first half of 2013 and shows a peak starting 
slowly a week later than YouTube, towards the end of March. A kind of plateau 
during the first half of March could be attributed to different elements, either the 
increase in the local and regional press of mentions of Harlem Shake videos, or 
the authoritarian responses to which African and Middle Eastern dancers were 
subject. 


18 The daily press corpus retrieved through Europresse contains 2,362 articles, which are then 
observed only for the period of the first half of 2013, thus retaining 408 articles. 
19 Figures 7a, 7b, 11 and 13 were produced by Fred Pailler as part of the HIVI project. 
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Harlem Shake : 
daily frequencies from 2010 to 2020 


Harlem Shake : 
daily frequencies, 1st semester 2013 
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Figure 7a and 7b: Daily frequencies of articles related to the Harlem Shake in our Europresse 
corpus (7a. from 2010 to 2020, 7b. for the 1° semester of 2013). 


The INA web archives show the same chronology of the phenomenon in terms of 
audio-visual media.” 

A capture (Figure 8) dedicated to the French audio-visual web pages also 
clearly shows a peak in mentions in February 2013 and to a lesser extent in March. 

In another corpus, that of the BnF web archives related to French online 
press articles published during the first half of 2013,” regional daily press web- 
sites stand out primarily (Figure 9). 

In the BnF’s web archives, a search from January 2013 to May 2013 by fre- 
quency of appearance of the term “Harlem Shake” in the DNS (domain names sys- 
tem) clearly highlights regional press alongside video platforms (Figure 10). 

According to the Europresse database, regional dailies Oe, La Montagne, Le 
Berry Républicain, Le Dauphiné Libéré) each published between 12 and 23 articles, 
while a national daily such as Libération published 12 articles and Le Figaro 10 
articles over the same period. In the international and bilingual corpus from 


20 For the Harlem Shake, the INA web archive provides 547,521 results for web pages and 27,161 
results for videos (although there is some noise in the results, which also mention the booty 
shake for instance). 

21 The corpus was made available by the BnF Datalab in connection with the BUZZ-F project. 
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f (4P) (D sonetet TEE EES 
f Archives de l'internet Labs 


Trouver archive d'un site, d'une page ou d'un fichier 


237 206 résultats 


Rappel de ta recherche 
collecionsactualités AND text^hariem shake” AND crawl, date [2013-01-01T00:00.002 TO 2014-12-31123:59597] 
meer 

"Ecrans fr, les forums / Harlem Shake, appel aux secousses" 


Trier et valeurs dei facettes - 0.9 
Ai 


Archie du 11 mars 2013 
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* 2013 (203 142) 
* 2014 (34 064) Ecrans fr, les forums / Harlem Shake, appel aux secousses Réponse 44.0160: Harlem Shake, opel sux secousses Réponse be160: Harlem 
Shake. appe! aux secousses Réponse à&9160. Harlem Shake, appe! 
Nom de domaine (10+) cure | 
ame 
+ lamontagne fr (32.085) "LUMP se met au Harlem Shake” 
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Format: html 
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MAI Le Parisien LUMP se met au Harlem Shake Le Parisien - Dy a 4? minutes Les jeunes UMP affınmant sur leur site être «es premiers en France à 


Figure 9: BnF web archives related to Harlem Shake in the collection Actualités (News) from qt 


January 2013 to 31 December 2014. @ BnF. 


Europresse, we find a substantial presence of the regional and local press in 
France, the United Kingdom and Canada.” 

Although the selection of titles via Europresse cannot produce representative fig- 
ures for each country,” it is still possible to compare the newspapers’ profiles. The 
regional daily press published articles on the Harlem Shake for a longer period than 
the international and national press. The purple boxplot (Figure 11) covers practically 


22 It should be noted that, depending on the size and linguistic composition of the countries, the 
regional or local press may hardly exist or, in contrast, may be disseminated over territories 
comprising hundreds of thousands of people, i.e. larger than the territory of some national 
presses. 

23 The corpus extracted from the Europresse database with the query *Harlem Shake" is made 
up of 6596 French newspapers, 66.396 articles in French (mainly from France, but also from Can- 
ada, Belgium, Switzerland and Algeria), and 66.296 regional press (again mainly French, but also 
British and Canadian). 
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Figure 10: Frequency of appearance of Harlem Shake in DNS (domain names systems) in the 
cleaned corpus (top 10) of BnF web archives (January 2013 to May 2013). BUZZ-F project, BnF Datalab. 


the entire period from the beginning of February to the end of June 2013, while the 
number of articles in the national press falls from mid-May. In the international 
press, articles become sporadic from late March onwards. 50% of articles published 
in the international press appeared prior to 1 March 2013, prior to 11 March for the 
national press and prior to 18 March for the regional press. 

We may then have a further look at the various types of Harlem Shake treat- 
ment according to the territorial dissemination of newspaper through words fre- 
quency (Table 1). If we list the most common context-words^' in the title or body of 
national and international English-language press articles related to Harlem Shake, 
we notice “craze”, “meme” and “viral”, while the regional press uses *craze", “new” 


24 The results are based on a 10-term window around the phrase *Harlem Shake". The calcula- 
tions are performed using the *KWIC" (Key Words In Context) function of the *Quanteda" text 
mining package (R. Benoit et al., 2018). 
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Table 1: Words frequency in the Europresse corpus related to Harlem Shake. 


EN nat/internat press EN regional press fr nat/internat press fr regional press 


word n word n word n word n 

dance 81  harlem 24 a 110  harlem 812 
harlem 62 shake 22  harlem 103 shake 810 
shake 60 dance 20 shake 101 a 516 
videos 46 videos 18 danse 63 vidéo 409 
said 41 new 15 aussi 37 phénomène 318 
youtube 38 hit 13 dun 37 plus 302 
video 36 video 13 youtube 36 cest 255 
week 30  craze 13 phénomène 33 personnes 232 
craze 29 internet 12 plus 32 jeunes 203 
song 29 students 11 cest 31 internet 203 
one 26 songs 11 vidéo 31 musique 185 
new 25 permission 11 style 30 fait 182 
hot 21 viral 11 tout 29 vidéos 180 
baauer's 21 seeking 10 fait 28  harlemshake 174 
style 19 compensation 10 depuis 27 buzz 166 
real 18 people 10 gangnam 26 mode 164 
people 18 student 9  tunis 25 danse 161 
like 17 without 9 février 25 groupe 160 
billboard 17 class 9 dj 24 place 156 
dancing 17 miners 9 quelques 23  baauer 156 
students 16 dancing 9 vidéos 23 youtube 154 
crazy 16 song 9 web 23  loufoque 152 
also 16 two 9 nouveau 22  dansant 134 
just 16 said 9 tunisie 21 deux 129 
worldwide 15 st 9 jeunes 21 faire 128 
times 15 york 9  secoue 20 samedi 126 
meme 15 used 9  l'education 20 tout 124 
online 14 north 9 étudiants 19  maniére 124 
viral 14 australia 8 lire 19 février 124 
thousands 14 recorded 8 ça 18 morceau 120 


and “hit”. For the French-speaking press, the noun *phénoméne" (phenomenon) and 
the verb “secouer” (to shake) are favored by the national and international press. 
The regional press also uses “phénomène”, along with “mode” (fashion/mainstream) 
and “buzz”. The articles explicitly mention the viral nature of the phenomenon, 
adding a performative dimension to the journalistic treatment of virality. 

The audio-visual media play on the same performative codes and vocabulary. 
For example, the francetv.fr webpage in the INA web archives also uses the French 
verb *secouer" (to shake), alongside *nouvelle tendance" (new trend) (Figure 12). 
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The vocabulary of collective organization (“gathering”, “square”, “leisure center”, 
etc.) is present in two ways in the articles of the regional daily press: either prior 
to the event, in a short article indicating the arrangements to take part in the per- 
formance? or afterwards, by reporting both the dance and its uploading to You- 
Tube.” In both cases, the regional press amplifies the phenomenon, as it gives it 
more visibility and echoes, either by enabling its organization or by publicizing 
its completion, conferring it a local social value. Some amateur videos stand out 
for instance: in Chauny, in the Aisne region of France, people getting married, 
and their families took part in a Harlem Shake that was deemed scandalous by 
the town council team. The article in the local newspaper, embedding the video 
in its web version, further publicized the phenomenon." Entertainment music 
was subsequently banned from the town hall during weddings. 

A special case is also that of young people who have performed Harlem Shake in 
countries such as Tunisia and Egypt and have been punished by the authorities or 
harassed by Islamist militants. On this occasion, the national press commented on 
this political opposition to the dance, even though they had not necessarily com- 
mented on the video production practices of youth before. It was the change of socio- 
political context that revived the mention of the Harlem Shake in the national press, 
but also in the regional press, which added an international element to its pages. 

Although the Harlem Shake was born mainly thanks to the YouTube plat- 
form, many videos only had a very modest audience, contributing above all to the 
mass of videos that characterizes the phenomenon. The treatment of Harlem 
Shake by the press expands the perspectives on the original content and its uses. 
This diffraction of the phenomenon is also clear on social platforms. 


2.3 Spreadability through Twitter 


While the press and traditional media are crucial to analyze viral phenomena, 
particularly because they qualify this virality from a relative exteriority (indicat- 


25 For example, *Un HarlemShake in Hénin. Les flashmobs sont morts, vive les HarlemShake", 
La Voix du Nord, 5 April 2013. 

26 For example, *300 people for a Harlem Shake in Clermont-Ferrand. The appointment had 
been given via Facebook. 300 people answered the call of a Clermont-Ferrand DJ on Saturday 
afternoon to dance this very fashionable choreography on the Place de Jaude: the Harlem Shake. 
The video of the shoot can be seen on lamontagne.fr." (our translation). Published in La Mon- 
tagne on 9 March 2013. 

27 https://www.aisnenouvelle.fr/art/region/chauny-depuis-l-incident-du-harlem-shake-les- 
ia16b110n393024, accessed July 11, 2023. 
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ing how it happens “online”), the discussion around the Harlem Shake has rather 
taken place on Facebook and Twitter. 

We applied a distant reading to a corpus of some 7 million tweets covering 
the period of the first half of 2013? and containing *harlemshake" or “Harlem 
shake”, making it possible to identify some information flows. The daily frequen- 
cies of tweets in the corpus form a particularly regular peak (with the exception 
of the peak of more than 300,000 tweets on 1° March), which is steadier than the 
peak observed for the press (Figure 13). 
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200000 Declared language 
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Figure 13: Daily frequency of tweets about Harlem Shake during the 1° semester of 2013. 


28 We would like to sincerely thank Frédéric Clavert who helped us collect the tweets through 
the Twitter API and Twarc. 


Keep Calm and Stay Focused — 143 


39.7% of tweets contain URLs. The URLs show that the virality of the phenomenon 
is not only due to the strategies of the cultural industries: the social platforms 
have an essential role in the circulation of these contents, by the presence of 
URLs referring from one to another. The corpus contains 2,777,125 tweets that in- 
clude URLs from more than 24,000 different DNS. YouTube's DNS appears app. 
1,464,000 times and Facebook's DNS app. 131,000 times. Photo and video sharing 
platforms such as Instagram, Tumblr, Vimeo also feature prominently, regularly 
linking to Harlem Shake pages (e.g., http://harlem-shake.tumblr.com/ which tried 
to capitalize on the publishing wave by sharing some videos). One would expect 
links to the knowyourmeme.com platform to be numerous, but this is not the 
case, with only some 300 URLs. 

It is also interesting to note that the press websites figure largely in the top 
100 DNS, but in a much lower proportion (less than 50,000 URLs in all) than social 
platforms. The articles are linked to directly, rather than commented on (unlike 
the practice of live tweeting TV programs). The regional press is only marginally 
present, with about fifty URLs pointing to the French newspaper Midi Libre, and a 
little less than a hundred URLs to Ouest France (with articles covering the ten- 
sions in Tunisia or with articles on the Harlem Shakes of local football teams). On 
the contrary, it is mainly the major international newspapers (New York Times, 
Wall Street Journal) and pure players (huffington post, quarz.com, mashable.com, 
buzzfeed.com, etc.”’) as well as the BBC and CNN editorial offices that appear in 
the top 100 URLs. Without the preliminary study of the regional press in the press 
corpus, its role would not have emerged from an analysis of the tweets. 

On Twitter, hashtags may be used to mark positions and social groups partici- 
pating a controversy (Cervulle and Pailler 2014). This is not the case with the Har- 
lem Shake, which remains a rather consensual phenomenon. Hashtags mainly 
describe the phenomenon and its reception and experience. Around 4,747,000 
tweets even mentioned the Harlem Shake without a hashtag, i.e. without trying to 
include a collective indexing criterion, as it is usually the case with political 
movements (#metoo) or entertainment events (Zeurovision2022 or #champion- 
sleague). Unsurprisingly, the three most used hashtags are £harlemshake (app. 
1,445,000 times), #harlem and #shake (app. 30,000 and 20,000 times). The other 
most popular hashtags point out its origin (youtube, 19,000 times) and its rela- 
tionship with another viral dance, the #gangnamstyle (app. 16,000 times). #egypt 
and #tunisie appear in the top 100, with 2,384 and 1,934 occurrences respectively, 
a modest presence in the corpus compared to our press analysis. 


29 buzzfeed.com was the first media to point out the phenomenon and it immediately attracted 
300,000 views on the videos' pages (Soha and McDowell 2016). 
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Of the hundred most used hashtags, ten or so correspond to pure players?" or 
TV shows (The Simpsons, Saturday Night Live, VladTv, WorldStarHipHop, Touche 
Pas à Mon Poste, etc.), one-off entertainment or cultural events (Kids Choice 
Award, South by South-West, SXSW, Soundwave Festival) and sports teams (Man- 
chester City Football Club for example). 

If we take a closer look at the app. 500 tweets containing #SNL, the hashtag 
for the TV show “Saturday Night Live" (SNL), one can see that as early as 3 Febru- 
ary 2013, some tweets were calling for a Harlem Shake. This took place on 10 Feb- 
ruary at SNL. Some tweets praised it, while others were already expressing their 
fatigue at hearing Baauer's title. During the month following the show, after the 
video of SNL’s Harlem Shake was uploaded on YouTube, the last tweets expressed 
their offbeat amusement. The same “fan request — live commentary on the perfor- 
mance - later replay" logic can be found in tweets about other shows, such as the 
French show *Touche Pas À Mon Poste". This temporal pattern corresponds to 
community management work that has been widely integrated into entertain- 
ment media production since the 2010s. 

While Twitter was a media hub throughout the Harlem Shake viral peak, the 
variety of accounts that took part in the conversation around the videos, whether 
held by humans or bots, remains to be questioned. More than 6,823,000 different 
users tweeted about the Harlem Shake, indicating a very low ratio of Harlem 
Shake-related tweets per user. Most users only expressed themselves once on the 
topic. Amongst the 500 Twitter accounts that published the most about the Har- 
lem Shake, no known users had a prolonged impact on the phenomenon.” We 
can also identify bots: the most prolific account tweeted 44,655 times about the 
Harlem Shake in less than six months (which amounts to an average of about 250 
tweets per day), leaving little doubt about the hybrid human/bot nature of the ac- 
count. There is also a series of accounts with the terms *harlem" and “shake” in 
their handle, which were created to capture the attention of the viral phenome- 
non and have produced around 15,000 tweets. 

Although different type of actors and business strategies have played a role 
in shaping both content and audience related to the Harlem Shake, the unique 
tweets published by average users remain a key element to understand how vir- 
ality arises. In many cases, those users only told their followers that they liked or 
hated watching a Harlem Shake video, or that they had taken part in one of them. 
These tweets create the mass that makes the Harlem Shake looks viral. This is 


30 Media companies that only operate digitally. 
31 The Mad Decent musicians initially promoted the #harlemshake through their twitter ac- 
counts in early February 2013, but each of them only tweeted few times on the topic. 
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consistent with the conclusions drawn by Goel et al. on what they labelled “struc- 
tural virality", a notion *that interpolates between two conceptual extremes: con- 
tent that gains its popularity through a single, large broadcast and that which 
grows through multiple generations with any one individual directly responsible 
for only a fraction of the total adoption" (Goel et al. 2015: 180). 

This mass of users has semiotic imprints (the numerous remixes of original 
videos and content variations about the topic), statistical imprints (via analytics 
and ubiquitous metrics on social platforms) and social imprints (the more the 
users know about the videos, the more varied the experiences will be). However, 
this mass is not really visible as such outside of social platforms. For example, if 
we zoom in on the Twitter archives collected by the INA for the French audiovi- 
sual domain (Figure 14), we can see differences compared to the global tweets 
analysis carried out previously. 

The INA archives show a clear peak linked to the TV show TPMP (Touche pas 
à mon poste) and give insights about how show business reacted to the Harlem 
Shake (in France). However, they don't show the bigger audience on Twitter due 
to the frame of the web archives which focus on audio-visual content. Once again, 
this demonstrates the need for scalability, but also for a medium reading, atten- 
tive to platforms, perimeters and containers. 


3 Conclusion 


Keep calm and stay focused ... by choosing to hijack the now-memetic phrase 
*Keep Calm and Carry On" in the title of this article, we aimed to underline the 
historical origin of certain Internet phenomena and their need for historicization. 
In particular, the *Keep Calm and Carry On" wording and its history perfectly il- 
lustrate the need to put viral phenomena into context and to take an interest in 
their dissemination, their audience and even their marketing: these propaganda 
posters and motto produced during the Second World War were never seen by 
the population at the time, as they were produced for distribution in case Ger- 
many invaded Britain. They only emerged when they were discovered and popu- 
larized by a bookstore owner around 2000, before being used as a slogan on 
mugs or T-shirts sold cheaply on the web. 

*Keep calm and stay focused" also enlightens the need of a scalable and me- 
dium reading of past memes, in addition to a semiotic approach. Crucial issues 
are overlooked when communities, infrastructures, spaces and temporalities of 
memes are neglected. 
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Though this article suggests answers to the temporal and spatial issues raised 
by viral phenomena, it also opens several methodological options, which do not 
have the same cost in terms of work and technical skills: the first consists of build- 
ing a complete database for each phenomenon, the second consists of making ob- 
servations and analyses by entwining a close reading with increased quantitative 
contextualization, as set out in this paper. The later methodological approach is 
less comprehensive, but can be put into practice realistically, while being open 
enough to a multitude of adaptations and new interpretative approaches. 

A database must deal with the complexity of the field in itself, in particular the 
broad diversity of sources and formats: how to aggregate second-hand data (col- 
lected from APIs or scraped), initially designed for the needs of platforms, i.e., for 
the capitalist exploitation of (user generated) content and digital traces? The series 
of choices to be made regarding what should be preserved or transformed or inter- 
preted Oe, recoded) from one platform to another is difficult, not to mention the 
question of data circulating in messaging systems such as Messenger, WhatsApp, 
Telegram, Snapchat or Signal (Rogers 2020). These services do not easily provide 
data unless researchers get involved in ethnographic observations and first-hand 
data production. As our research is a historical one, that often comes after the phe- 
nomenon has vanished, it does not allow to intentionally observe the phenomenon 
as it unfolds. These insights are lost for historians and unpreserved by design. 

Furthermore, the platforms themselves are not necessarily predisposed to 
providing well-developed display tools for viral phenomena. Traceable objects 
like hashtags as used in digital methods for diffusion analysis (de Zeeuw et al. 
2020) are not enough to recover these phenomena, that also claim for cross- 
platform analysis (Rogers 2017). Creating and finding ways of translating data and 
making them interoperable (for example, metrics are very difficult to translate 
term-for-term), of comparing several platforms, several stakeholders producing, 
delivering or heritagizing data, several contents, periods and geographical areas, 
are fully part of the investigation. 

Thus, our alternative is much more modest, but realistic and, when reiterated 
for different platforms and different periods, it may help to identify some patterns 
of circulation, replication, and transformation of original content. The research we 
presented in this article can be further developed by classifying/clustering visual 
and textual contents for example. The classification of images, be it manually or 
with the assistance of computer processing, tackles semiotic problems by distin- 
guishing images that may be either variations on an initial image, or resignifica- 
tions of the initial image, or even contain iconic motifs that refer back to the 
original image, while making it difficult to technically establish a formal link with 
it Julliard and Daller 2023). Topic modelling, sentiment analysis or the Reinert 
method (via Iramuteq software or R package Rainette, for example) can also apply 


148 — Fred Pailler and Valérie Schafer 


to the texts related to images. Text analysis can help understanding circulation in 
different ways: comparing different contexts (different times or cultural/linguistic 
areas), and even, in very specific cases, identifying the match between social graph 
and semantic clusters (Ratinaud & Smyrnaios 2014). In any case, this implies a com- 
bination that takes advantage of a scalable and medium reading which affords the 
container as much importance as the content, while enlightening the meaning in 
context. 
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© Zooming in on Shot Scales: A Digital 
Approach to Reframing Transnational 
TV series Adaptations 


Abstract: This pilot study illustrates an empirical cross-cultural comparative analy- 
sis of transnational TV series adaptations. It investigates patterns of shot scale dis- 
tribution in conjunction with gender and display of emotions to uncover and 
compare cultural representation in France and the US. The study showcases 16 epi- 
sodes of Law & Order: Criminal Intent and its French adaptation, Paris Enquétes 
Criminelles, for a total of 44,602 frames. Relying on the deep learning toolkit, the 
Móbius Trip, we propose an objective shot-scale model to frame and quantify a 
large quantity of data. We propose a layered, four-level reading of the data through 
intercultural models, media theories and feminist and psychological approaches to 
articulate cultural decoding. This study provides insights into the ethics of televi- 
sual representations of male and female characters on screen across cultures and 
the process through which cultural proximity is achieved. 


Keywords: transnational TV series adaptations, artificial intelligence, culture, big 
data, cross-cultural comparison 


1 Introduction 


This pilot study investigates the impact of emotion display and shot scales on cul- 
tural and gender representation in transnational TV series adaptations. Showcas- 
ing eight episodes of the American TV series Law & Order Criminal Intent, and 
their equivalent French adaptation, Paris Enquétes Criminelles, amounting to 
44,602 frames, we seek recurring patterns of shot scales in conjunction with gen- 
der and characters' expressions of emotions to uncover and compare cultural re- 
presentation in France and the US. We rely on our AI toolkit, The Móbius Trip, a 
multimodal analysis engine based on machine learning techniques, to conduct 
our research. We establish a shot-scale model based on strict conventions that 
provides a steady rationale to label, classify, measure and compare visual data. 
Following a dynamic model of close/distant reading, we conceptualize different 
levels of reading of the audiovisual text by gradually zooming in on the data. We 
propose four levels of reading as we include new variables. We first focus on shot 
scales between the French show and the American show at a cultural level. Next, 
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we zoom in and analyze male and female characters through shot scales. Lastly, 
we zoom in again and look at the character’s emotions through shot scales. The 
episodes are reorganized in the shape of graphs and visuals to better discern pat- 
terns and ease comparisons between the French and the American version. There 
are substantial differences between both versions of the show exposing the con- 
text and values of the societies they are embedded in. Big data combined with 
extremely detailed depictions help us understand the intricacies of cultural repre- 
sentations on-screen between France and the US. Our innovative approach offers 
unprecedented data and opens the arena for new comparative cultural studies in 
film and TV series. The project is still developing, and we are presenting tentative 
results of pilot projects to experiment with and evaluate the validity of the cur- 
rent stage of our work. 


2 Concepts 
2.1 Framing / Reframing 


Our comparative research on transnational TV series between Law & Order: 
Criminal Intent and its French adaptation Paris Enquétes Criminelles is framed 
within framing theory. This theory, pioneered by sociologist Erving Goffman in 
1974, contends that frames enable people to “locate, perceive, identify, and label" 
the flow of information around them (Goffman 1986: 21). The primary function of 
framing is thus to describe, organize or structure message meaning. To Goffman, 
framing is a system through which people can understand culturally determined 
definitions of reality and make sense of the world. It is, therefore, a necessary 
part of human communication. Framing plays an important role in how a partic- 
ular issue is presented before the people and how they perceive it. 

In cinematography, framing refers to all the elements that appear in the 
frame, as well as the way they are arranged to convey meaning. It makes audiovi- 
sual texts intelligible by decoding the meaning system carried in each frame and 
allows us to understand better how directors fill the screen to manipulate the au- 
dience. The director has access to multiple techniques, such as shot scale, camera 
angles, color, light and aspect ratio, among others (Doane 2021: 76; D'Angelo and 
Kuypers 2010: 248). In addition to film techniques that explore the possibilities of 
cinema, Renita Coleman refers to visual framing *to mean media content that is 
processed by the eye alone" (D'Angelo and Kuypers 2010: 236). Coleman explains 
that visual framing research is concerned with the portrayal of race and gender 
stereotyping as well as emotions elicited by images and their effects on viewers 
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(D'Angelo and Kuypers 2010: 244). This approach is appropriate for our research, 
which is concerned with gender representation on screen. 

We study transnational TV series adaptations, which consist in adapting a nar- 
rative structure to a domestic context. We rely on framing theory to analyze ele- 
ments that constitute a televisual text and understand how it is reframed to suit 
another cultural environment. Comparing the two crime shows highlights the fram- 
ing choices (e.g., film techniques) through which cultural representation is achieved. 
Such research emphasizes the importance of framing and the ways in which it im- 
pacts how a story is told; as French film theorist Jean Mitry points out, “The story 
will be the same, but the impressions, emotions, ideas and feelings expressed will be 
utterly different" (Mitry 1997: 135). The comparison makes data meaningful by pro- 
viding a point of reference for aesthetic choices and cultural differences. Russian 
philosopher Mikael Bakhtin stated, “In the realm of culture, outsideness is a most 
powerful factor in understanding. It is only in the eyes of another culture that for- 
eign culture reveals itself fully and profoundly” (Bakhtin 2010: 1). The aesthetic 
choice of the adaptor is motivated by cultural proximity — the idea that audiences 
favor media that reflect their own local culture (Burch 2002: 572). Hence compara- 
tive study informs us about the communication style of each culture, its cultural 
norms (e.g., emotions display), as well as its tradition of filmmaking. 

The framing theory also applies to our analytic method. Managing a large quan- 
tity of visual data is contingent on objective framing (identifying, labeling and classi- 
fying) of gender representation, emotion display and shot scale. We reframe shot 
scale conventions by challenging the loosely defined conventions and proposing a 
new framework to objectively quantify, measure and compare shot scale. 


2.2 Shot scale 


Our research focuses on the implications of shot scales in transnational TV series 
adaptations. Shot scale is defined as *the apparent distance of characters from 
the camera, is one of the most effective visual devices in regulating the relative 
size of characters' faces, the relative proportion of the human figure to the back- 
ground and arranging film content according to its saliency (Carroll and Seeley, 
2013)" (quoted in Rooney and Bálint 2018). It is one of the vital cinematographic 
features that regulate the relative size of characters' faces, the relative proportion 
of the human figure to the background (Salt 1992; Bowen and Thompson 2013), 
arranging film content to emphasize an element (Rooney and Bálint 2018) and di- 
recting the audience's gaze on particular elements (Cutting 2021: 2). Shot scaling is 
not just a film technique part of the film language; it is an element of representa- 
tion that might carry meaning. 
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Film studies scholar Annette Kuhn defines shot scales as *An informally agreed 
and widely accepted set of conventions that describe and define different framings 
of a film image, or apparent distances between camera and subject" (Kuhn 2012: 
1321). Depending on the model, shot scales are divided into seven or nine catego- 
ries. Shot scales typically range from Very Long Shot (VLS), Long Shot (LS), Medium 
Long Shot (MLS), Medium Close-up (MCU), Close-up (CU) and Big Close-up (BCU). 
Though shot scales are a fundamental expressive tool of the film language, the ter- 
minology is quite elastic and deals mainly with concepts (Arijon 2015: 31). The ter- 
minology is approximative and not always consistent. As Monaco describes, *One 
person's close-up is another's ‘detail shot,’ and no Academy of film has (so far) sat 
in deep deliberation deciding the precise point at which a medium shot becomes a 
long shot or a long shot metamorphoses into an extreme long shot. Nevertheless, 
within limits, the concepts are valid" (Monaco and Lindroth 2000: 197). Shot scale 
frameworks (Figure 1(a, b)) highlight the difference between Barry Salt's scale in 
terms of proportion as well as in the terminology (e.g., MLS and Knee shot) and 
that of Daniel Arijon. We can see that a big close-up (BCU) is sometimes referred to 
as an extreme close-up (XCU). Likewise, a medium-long shot (MLS) can also be 
called a knee shot; it is different from an American shot that starts above the knees 
but is not always considered in formal frameworks. In sum, the convention often 
diverges and the contrast between shot scale models can be significant. 


Extreme close up 
Medium close up 


Waist shot 


Medium shot 


Knee shot 


Figure 1: (a) Barry Salt's shot scale framework, Barry Salt, https://cinemetrics.uchicago.edu/salt.php, 
accessed July 31, 2023. (b) Daniel Arijon's shot scale framework, Daniel Arijon. Grammar of film 
language, Figure 3.6: “Types of shots", p. 36 of 706. 
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2.3 Aspect ratio 


Law & Order: Criminal Intent and Paris Enquétes Criminelles share the same narra- 
tive but are shot in different aspect ratios. Movies and TV series tell a story through 
time, and the aspect ratio plays a part in how the story comes across. Aspect ratio 
is the ratio of the width to the height of an image. The format evolved following 
technological progress and cinematic trends. For TV shows, the aspect ratio is 
forced to follow the evolution of television devices. For decades, the standard ratio 
for television used to be 4:3 (1.85:1) to fit the squarish frame of the television of the 
time. 4:3 is also known as the Academic Ratio because it was standardized by the 
Academy of Motion Picture Arts and Sciences as the standard film aspect ratio in 
1932. The 4:3 was replaced by the 16:9 (1.77:1/1) aspect ratio in the 1990s with the 
advent of widescreen HDTV. The rectangle widescreen display offers a more im- 
mersive and cinematic experience to the viewers. It is a compromise that allows 
the audience to watch blockbuster films as well as regular television programs. 

Aspect ratio plays *a fundamental, determining role in forming and framing 
television's spaces" (Cardwell 2015: 83). Aspect ratio impacts filmmakers' creative 
choices. Bordwell explains, *Because home video might crop part of the image, 
some directors compose their shots so that the key action is concentrated in an 
area that will fit smaller displays" (Bordwell, Thompson, and Smith 2017: 47). In 
altering the artistic choices, the aspect ratio can impact a TV series' style and 
mood. For instance, The Wire creator David Simon refused to conform to switch- 
ing from a 4:3 to 16:9 ratio because the creators decided to *use 4:3 to connote 
both a classic televisual aesthetic and unglossy, social realism, and to explore the 
not-yet-fully exploited spatial possibilities of 4:3" (Cardwell 2015: 95). For instance, 
a 4:3 ratio can provide an old-timey feel that contributes to the style of the show; 
it can also give a real feel because the ratio allows filling the frame without stand- 
ing too far from the character, which is the preferred format by comedies and 
drama. 


3 Framing the research 


This section introduces the theoretical framework that structures our compara- 
tive analysis on shot scales and emotion display in Law & Order and Paris En- 
quétes Criminelles. We frame our layered reading analysis on four levels, namely 
intercultural, media, feminist and psychology. 

Each level will provide us with a lens to read our empirical data as we zoom 
into more granularity. 
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3.1 Level 0: Intercultural framing 


This level looks at emotion display norms in different cultures. It introduces Ed- 
ward T. Hall's Contexting Model to account for these differences. 


3.1.1 Emotion display 


Many studies suggest that emotions are genetically hardwired into all human beings 
and that basic emotions such as happiness, sadness, anger, surprise, fear and disgust 
are universally shared (Grodal 1999: 90; Waller et al. 2008: 435) (Figure 2). Patulny 
et al. explain, “People share emotions independent of age and gender, education, sta- 
tus, and cultural practices, thus corroborating the universality of emotion sharing" 
(Rimé 2009, quoted in Patulny et al. 2019: 104). While emotions are perceived as in- 
nate and, therefore, transcultural, they play out differently and take on different 
meanings according to a culture's communication style specifics. Patulny et al. ex- 
plain that cultural norms rule the display of emotions and are learned early in life. 
Culture display rules “function to regulate expressive behavior depending on the so- 
cial context" (Patulny et al. 2019: 137). Appropriate emotion display is proof that a 
member of a community is well integrated. 


Figure 2: Facial expressions data set: joy, anger, disgust, sadness, surprise, and fear. 
Source: Cohn-Kanade (quoted in Crawford 2021). 
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3.1.2 Contexting model 


Anthropologist Edward T. Hall proposes the Contexting Model to frame how emo- 
tion display plays out in the communication style in different cultures. He presents 
two different cultural approaches to communication: high-context and low-context. 
Hall explains, 


A high-context (HC) communication or message is one in which most of the information is 
either in the physical context or internalized in the person, while very little is in the coded, 
explicit, transmitted part of the message. A low-context (LC) communication is just the oppo- 
site, i.e., the mass of the information is vested in the explicit code (Hall 1989: 79). 


Members of high-context cultures like the French culture tend to share implicit 
knowledge with their fellow community members. Consequently, interactions 
rely less on words and more on non-verbal communication cues, including facial 
expressions. In contrast, in low-context cultures like the American culture, people 
communicate with explicit content because the goal of communication in an 
American context is clarity. Hence, Americans rely less on facial expressions and 
therefore display fewer emotions. 


3.2 Level 1: Media framing 


This level frames the media aspect of this investigation. It provides us with tech- 
nological evolution, style and genre and spectatorship practices. This level of 
analysis is concerned with technical elements and how they inform us of the cul- 
tural tradition of film and TV series practices. 


3.2.1 Technology 


David Bordwell approaches the stylistic evolution of shot scales from a technical 
and historical perspective in cinema and TV series.! He explains that technologi- 
cal progress forces directors to rethink their aesthetic approach and adapt their 
cinema techniques (Bordwell, Thompson, and Smith 2017: 46). For instance, the 
rise of CinemaScope in the 1950s led to a new cinema screens aspect ratio, which 
in turn impacted the way directors use shot scale. This change in cinema also op- 
erates in the television industry. Older generation smaller TV sets with a 4:3 ratio 
and poor definition forced *the TV series directors to rely on closer, more visible 


1 Cinema conventions impact TV series. 
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shots" (Bordwell, Thompson, and Smith 2017: 47). In the 2000s, the surge of wider 
and bigger screens resulting from technological advances spread worldwide and 
became the new standard for the TV industry. The ratio had an impact on direc- 
tors who had to rethink their artistic choices. One of the impacts is that medium 
shots and long shots appear to be of normal scale (Doane 2018). The point of con- 
tention resides in whether the close-up is compatible with a wider ratio. For Bord- 
well, a wider screen made close-ups unnecessary when he states, “Directors even 
refrained from using close-ups, perceived as too aggressive, and enticed them to 
use distant framings and full-size figures" (Bordwell, Thompson, and Smith 2017: 
47). In turn, French film critic André Bazin believed that the close-up could be 
sustained with wider screens, and “the ‘useless’ space that surrounds faces is thus 
not as useless as all that; on the contrary, it highlights those faces, not in relation 
to the frame but in restoring them to a natural relation with space" (Cardwell 
2015: 92). Media scholar Paul Frosh takes a cautious approach when he says, *the 
headshot—although by no means entirely eroded—has become less dominant in 
the televisual repertoire than previously since it does not fully exploit the more 
contextual and epic dimensions of the widescreen format" (Frosh 2009: 98). 


3.2.2 Style and genre 


Jason Mittel explains that television genres result from a body of production techni- 
ques, textual aesthetics and historical trends (Mittel 2004: xi). Indeed, technology im- 
pacts production techniques, which in turn impacts the style of the movie and, 
consequently, its genre. Hence, the TV series genre is defined through historical peri- 
ods and the changing use of this aspect over time. Shot distances are often dependent 
on the film's narrative, genre and overall style (Edgar-Hunt et al. 2015: 123). Focusing 
on the impact of shot scale on TV crime shows, Barry Salt concludes, “The main var- 
iations in shot scale seem to depend on genre" (Salt 2001: 112). For instance, western, 
war, and adventure contain high amounts of very long shots, whereas melodramas 
and comedies are generally shot from closer (Roggen 2018: 10). Relying on a corpus 
of twenty crime shows, Salt concludes that crime TV shows are predominantly shot 
with a vast majority of CU, towering over any other shot scale (Salt 2001: 104). 

Genre is a fluid concept that includes a group of procedures regarding com- 
position, style and topics (Albera 1996: 144). Typically, film techniques abide by 
the codes and conventions of the genre they portray. For instance, Nick Redfern 
established that comedy and romance have bright colors (Redfern 2021: 265). 
James Cutting demonstrated that action films tend to display a faster average shot 
length than drama (Cutting 2014: 76) and that a faster pace correlates with closer 
shots (Cutting and Candan 2015: 41). Upon comparing both shows, Digeon and 
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Amin have determined that Paris Enquétes Criminelles is a hybrid version of the 
American original (Digeon and Amin 2021). The French version is almost twice as 
fast, has brighter and warmer colors and contains more music. Based on these 
findings, we can say the French are a hybrid version of the original show because 
it contains elements related to comedy and action. 


3.2.3 Spectatorship cultural practices 


In the same way that technology and genre are markers of time, historical periods, 
and domestic tradition of filmmaking affect the use of shot scales, domestic con- 
sumers’ habits also influence directors’ artistic choices. Comparing the use of Shot 
Scales in Hollywood and German Cinema from 1910 to 1939, Nick Redfern claims 
that early cinema scale convention was to mimic the point of view of an audience 
in live theaters resulting in “long-shot distance, frontal perspective, unity of view- 
point, and relative narrative autonomy" (Redfern 2010: 7). Consumers' habits evolve 
over time, influenced by other media as well as global practices. Globalization 
seems to lead to the uniformization of consumers' habits. Redfern concludes that 
both the German and the US evolved from the distant framing of the medium-long 
shot and long shot to increased use of medium shots and medium close-ups. 

The recent growth in the size of home cinema TV sets since the 2000s also im- 
pacts viewers’ habits and modified directors’ TV series practices. Troscianko et al. 
found that “bigger is better" for faces and for landscapes (Troscianko et al. 2012: 
416). They mean that featuring faces and landscapes offers a more immersive, more 
engaging experience for the viewer. Hence, we can imagine that TV series directors 
need to follow the trend to add more CU and LS to comply with consumers' habits. 


3.3 Level 2: Feminist framing 

This section is concerned with a feminist approach to gender representation in film 
and TV series. The theories presented here enlighten us on gender inequality on 
screen, emotion display rules and the gaze of the camera. They reveal the blatant 
inequalities between genders and shed light on the practices that breed them. 


3.3.1 Gender representation on screen 


Women have been widely underrepresented on screen. Non-profit research orga- 
nization Geena Davis Institute stunned the world in 2004 when it exposed the bla- 
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tant inequality and under-representation of women in terms of screen time in the 
film and TV series industries. According to Geena Davis, “for every one female 
character, there were three male characters. If it was a group scene, it would 
change to five to one, male to female” (Savage 2011). Such inequality is also re- 
flected in the TV industry at a world level. The Conseil Supérieur de l'Audiovisuel 
(CSA), a French institution that regulates the various electronic media in France, 
concurs with Geena Davis Institute's findings, showing a similar imbalance in 
gender representation in the TV industry (CSA 2020: 8). In a previous study on Law 
& Order: Criminal Intent and Paris Enquétes Criminelles, Digeon and Amin revealed 
the overwhelming domination of male characters’ screen time with a 75%-25% 
average (Digeon and Amin 2021: 15). 


3.3.2 Emotion display 


In addition to cultural differences and social context, emotion display rules de- 
pend on gender. Each society abides by different gender norms. These gender 
norms play out in every aspect of everyday life and rule the display of emotion. 
Anger display is typically more accepted in men, while women are tacitly encour- 
aged to hide their anger early. In turn, men are taught to hide their feelings of 
sadness (Patulny et al. 2019: 137). These cultural norms are represented on screen 
because TV series mirror the society they are embedded in. As Chesbro et al. 
point out, “While depictions of men in film have tended to be extremely mascu- 
line, depictions of women have tended to be extremely feminine" (Chesbro et al. 
2013: 325). TV series mimic the behavior and reinforce it. Doane explains, *women 
are more emotive, with access to a greater range of facial expressions than men" 
(Doane 2021: 128). 

In Emotions, Genre, Justice in Film and Television, Deirdre Pribram takes a 
cultural approach to representations of emotions on screen. Showcasing a close 
reading of the movie Crash, she states, *Police officials, detectives (public or pri- 
vate), and legal personal are often motivated by anger: moral indignation the 
transgression committed by the offending party; sympathy for the victims, which 
usually comes displaced as an outrage at the perpetrator" (Pribram 2012: 33). Di- 
geon and Amin's empirical study on gender and emotions of Law & Order and 
Paris Enquétes Criminelles (Digeon and Amin 2020) concur with Pribram’s find- 
ings. They demonstrate that both American and French male characters display 
more anger and female characters show more fear. They suggest that men could 
be the perpetrators of the crimes while women would be more likely to be the 
victims. 
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3.3.3 Camera gaze 


Rooted in a psychoanalytic approach, feminist film theory has been a substantial 
component of film theory since the 1970s. It focuses on gender as the centerpiece 
of theoretical analysis of cinema and TV series (Kuhn 2012: 617) and deals with 
the unconscious of the text and symbol of the image. In her foundational article 
*Visual Pleasures", Feminist film theorist Laura Mulvey coined the term male 
gaze *male gaze" to describe the use of the camera as the eye of the dominant 
heterosexual white male (Mulvey 1989: 347). The camera becomes the media in 
which *women's bodies are objects that give pleasure through voyeuristic and fe- 
tishistic forms of scopophilia, pleasure in looking" (Oliver 2017: 452). The concept 
of the male gaze assumes that the camera operates as the eye of a white hetero- 
sexual man who typically objectifies women. Based on this idea, shot scales are 
consequential and become a pivotal indicator of the conscious (or subconscious) 
expression of sexual desires. Among all the shot scale types, The CU raises the 
most interest because it “tends to celebrate an intimacy of scrutiny that both mag- 
nifies the attractiveness of the film and produces an undue pleasure of looking, 
one that empowers an aggressive sexual instinct to conquer a desirable object" 
(Deppman 2021: 2). CUs have a mystique that seemingly glorifies women while op- 
pressing them. For male characters, the CU is also equivocal because, it is *aligned 
with castration, a psychic threat to masculinity" (Doane 2021: 138). Because of 
these reasons, we can assume CU is more prevalent for female characters. How- 
ever, Doane warns us, “It would be highly inaccurate, of course, to suggest that 
only women have close-ups" (2021: 138). 

In contrast to the CU, the long shot seems to do the exact opposite. *A long 
shot is just as stylistically compelling, morally problematic, and theoretically sug- 
gestive as the close-up in the study of the complex relations ethics, film style, and 
female power" (Deppman 2021: 5). A long shot desexualizes women's appearances 
because the audience has a complete picture of a woman that looks like real life 
(Deppman 2021: 4). Bazin called it the natural “fact” of life (Bazin 2005: 35). Based 
on Deppman's argument, some types of shots are connoted and might convey 
meanings beyond a shot scales' aesthetic and dramatic potential. 


3.4 Level 3: Psychology framing 


For this level of framing, we introduce the cognitive film theory and the Theory 
of Mind (ToM) to examine the impact of shot scale and emotion display. Cognitive 
film theory informs us on the shot scale distribution at the narrative level and 
how the aesthetic, artistic and creative choices allow the audience to understand 
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film. In turn, ToM is concerned with audiences’ emotional involvement through 
shot scale combined with characters’ emotions display. 


3.4.1 Cognitive film theory 


Cognitive film theory focuses on movies that are structured to convey their narra- 
tives to viewers. A cognitive approach allows us to “study the predispositions of 
the mind — its perception, cognition, and affect” (Cutting 2015: 192). Salt and 
Kovac find systematic regularity of shot scale distribution patterns in directors’ 
work (Koväcs 2014: 2; Salt 2009: 403). However, Koväcs suggests that consistent 
shot scale distributions cannot result from a conscious decision by the movie di- 
rector. To explain what makes filmmakers use similar kinds of shot scales in dif- 
ferent films, he proposes a cognitive hypothesis. He explains, “there are some 
psychological, perceptual constraints that rule the relative rate of closer or longer 
shots independently of the conscious choices of the authors (Koväcs 2014: 12). Di- 
rectors might unconsciously follow aesthetic rules, narrative types, style trends 
or genre conventions. 

In fact, television genres rely on viewers’ familiarity with forms and conven- 
tions. As per a cognitive process, the viewer has an understanding of the framing 
of the film and the preemption of its codes and conventions (Bordwell 1992: 184). 
“These processes involve the ‘construction’ of perceptual or cognitive ‘conclusions’ 
based on nonconscious inferences, which are in turn constituted by ‘premises’ of- 
fered by perceptual data, internalized rules and schemata, and additional prior 
knowledge* (Bordwell 1985: 31, quoted in Nannicelli and Taberham 2014: 8). Specta- 
tors tacitly agree with the conventions of a particular style and its mood. They pre- 
empt characters’ representation, display of emotions and film techniques that 
characterize a show. They unconsciously understand the informative and dramatic 
functions of shot scales and feel emotionally engaged with the narrative. 


3.4.2 Theory of mind 


Theory of Mind (ToM) refers to the psychological process by which people recog- 
nize and understand the mental states of others. ToM plays out on the audience's 
affect as it is concerned with the way shot scales impact film viewers. Along with 
other film techniques that contribute to regulating emotional involvement in a 
show, it is accepted that *the closer the image, the more it raises emotional 
arousal, the more distant the image, the more distant the viewer's emotional rela- 
tion to the image" (Kovas 2014: 1). Several empirical studies have confirmed such 
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a precept. Shot scales have been widely considered the most potent means to con- 
vey emotional intensity to an audience (Benini et al. 2016: 16501); it impacts view- 
ers responses related to "character engagement, such as theory of mind, emotion 
recognition, and empathic care” (Savardi et al. 2021: 3). Among the different shot 
scales, the close-up is often perceived as the most effective tool to display facial 
expressions and convey emotions to an audience. French philosopher Gilles Dele- 
uze calls the close-up the “affection image” par excellence (Bordun 2017: 90). 
Framing a strong emotion with a close-up increases an emotional response from 
the audience. As Rooney and Bálint put it, “Close-ups of sad faces produced higher 
levels of ToM-self than other conditions” (Rooney and Bálint 2018). Hence, a sad 
close-up is most likely to trigger a stronger response than the neutral close-up be- 
cause it is associated with higher levels of Theory of Mind. The other types of 
shots are not as thoroughly addressed as the CU, which is perceived as the most 
important shot type. Nonetheless, Canini et al. explain, “Medium shots are proba- 
bly not specific to a definite set of emotions, thus finding a fair level of employ- 
ment in all types of filmic material” (Canini et al. 2011). Long shots do not convey 
as much of an emotional response. Therefore, we can speculate that characters’ 
emotional responses will be portrayed in a higher proportion of closer shots. 

The intercultural, feminist and psychological approaches frame the present 
investigation. These well-established disciplines offer a broad understanding of 
film and TV series that is already quite sophisticated and certainly a pertinent 
approach. We base our cultural shot scale and emotion display comparison of the 
shows on this framework. On the one hand, each level provides a lens through 
which we read our empirical data and reveal a different aspect of the show. On 
the other hand, the levels are an arena where we confirm or challenge the theo- 
ries mentioned above with our empirical data. Our layered reading of the data in 
the Reading the Data section follows the same sequential order as the one pro- 
posed in the Framing the Research section above. 


4 Method 
4.1 Digital approach 


Our research is at the intersection of digital humanities, film and TV studies, intercul- 
tural communication, multimodality, cultural analytics and artificial intelligence. Be- 
cause film and TV series are, in essence, multimodal, we take a comprehensive 
approach with equal emphasis on the multiple modes to frame the cinematic text. 
We are driven by Datafication, the concept of turning realife occurrences into 
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computational data of moving images to uncover trends. We follow the footsteps of 
quantitatively motivated approaches, such as Barry Salt's statistical style analysis, 
Larkey et al.’s digital comparative approach, Lev Manovich's Cultural Analytics, and 
Geena Davis' Inclusion Quotient; we propose a corpus-based model supported by dig- 
ital tools. In the context of this comparative cultural study on Law & Order: Criminal 
Intent and Paris Enquétes Criminelles, we focus on cultural and gender representa- 
tion, emotion display and shot scale distribution. 

Inspired by software such as Videana, Atlas.ti, the Multimodal Analysis Software, 
and other digital tools, we have developed The Móbius Trip, a multimodal analysis 
engine based on machine learning techniques. The Móbius Trip compiles the soft- 
ware's attributes in one and operates automatically, removing time constraints and 
human errors. The toolkit transforms visual characteristics into quantifiable ele- 
ments to identify broad tendencies and non-obvious patterns in TV series (Digeon 
and Amin 2020). It is equipped with automated facial recognition processing that can 
objectively identify, label, and quantify the gender of the characters on screen, the 
emotion they display (Digeon and Amin 2020), as well as the shot scale in which char- 
acters are portrayed. The Móbius Trip frames audiovisual content, manages exten- 
sive volumes of metadata, makes complex predictions and generates visuals. It 
zooms into the data by crossing a wide range of elements with each other at a large 
scale to obtain comprehensive, precise and complex recurring televisual patterns. 


4.2 Analytical procedure 


We propose a distant reading approach to the episodes to find and compare oth- 
erwise invisible patterns or representations. The term *distant reading" was 
coined by digital humanities scholar Franco Moretti in the 2000s to refer to a digi- 
tal-driven quantitative approach being applied to a text and turning it into 
graphs, maps and trees. It uncovers the governing system that generates trends 
and patterns. We follow the precepts of digital humanities scholar Craig Saper's 
zooming-in approach to the smart set. Saper breaks the boundaries between close 
and distant reading and proposes a dynamic, layered reading of the data (Saper 
2021: 115). Saper states, “Counter to Moretti, and the critics of digital humanities 
alike, there is no close reading or distant reading: one can zoom in or zoom out 
on all data in the same readings" (Saper 2015: 206). 

Because this chapter is a pilot study, our distant reading approach is limited 
to four levels of reading. For each level of reading, we combine variables (e.g., 
culture, emotions, genders and shot scales) and address them through the lens of 
culture, media, feminism and psychology (Figure 3). 
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— Level 0: Intercultural Reading compares the emotion display between both 
versions of the show from a cultural standpoint. We analyze this reading 
through intercultural models. 

— Level 1: Media Reading compares the use of scale shots between both ver- 
sions of the show. The analysis is rooted in media theories. 

- Level 2: Feminist Reading, we compare male and female characters’ repre- 
sentation through display of emotions between both versions of the show. 
We add the shot scale variable for a camera gaze reading of the text. 

—  Level3: Psychological Reading compares the display of emotions of male and 
female characters' using shot scale between both versions of the show. We 
analyze the data via psychological approaches. 


Readings Variables 


Intercultural 
Level 0 Culture 
Emotions 


Media 
Level 1 Culture 
Shot scale 


Feminism 
Culture 
Gender 
Emotions 


Level 2 


Psychology 
Culture 
Level 3 Gender 


Emotions Figure 3: Digeon and Amin's proposed 
Shot scale layering approach. 
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4.3 Measuring emotions 


Kate Crawford explains that artificial intelligence is misreading human emotions. 
Kanade et al. point out several issues hindering the emotion recognition process, 
such as “the level of description, transitions among expression, eliciting condi- 
tions, reliability and validity of training and test data, individual differences in 
subjects, head orientation and scene complexity, image characteristics, and rela- 
tion to non-verbal behavior” (2000: 1). Acknowledging the reliability issue and po- 
tential bias, we rely on FER Python package to frame emotions and improve the 
accuracy ofthe data by training the toolkit multiple times. We estimate our emo- 
tion recognition to be 70% accurate. 


4.4 Reframing shot scale framework 


The ambiguity of shot scale terms and conventions addressed earlier in this chap- 
ter impacts the film industry but also affects researchers. A lack of a formal 
framework hampers the discussion and may lead to potential misunderstandings, 
inaccurate findings and inexact interpretations. The advent of new media and 
large-scale computational approaches to film analysis urges us to reframe shot 
scales with objective and consistent standards. Hence, we introduce an Al-based 
shot scale framework with strict conventions (Figure 4). 

Our attempt to define shot scale conventions with stricter edges is fueled by 
the need for a coherent and reliable ratio to proceed with our large-scale data 
analysis. A detailed framework is vital to replicate a similar study with other 
shows using this approach. Furthermore, a common structure with shared con- 
ventions is needed to facilitate dialog in the film and TV series community be- 
tween film scholars and film industry professionals. Our proposed approach is an 
attempt to fill this gap by utilizing an objective convention to reframe the shot 
scale framework. The software extrapolates the relative distance of the charac- 
ter's body within the space. It reconciles the ratio-to-frame and imagined distance 
approaches and determines strict edges based on learned patterns. 
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Figure 4: Digeon and Amins' proposed shot scale framework. 
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4.5 Sampling 


To conduct this case study, we showcase Law & Order: Criminal Intent and its 
adapted French version Paris Enquétes Criminelles. Law & Order: Criminal Intent 
ran from 2001 to 2011 for a total of 196 episodes in 10 seasons. In turn, the French 
ran from 2007 to 2009 for only three seasons and 20 episodes. The corpus for this 
quantitative study consists of 44,602 frames. It includes eight episodes of Paris En- 
quétes Criminelles, released in 2007, and the eight corresponding episodes of Law & 
Order: Criminal Intent, released in 2001 (Table 1). 


Table 1: List of Episodes sample: Low & Order: Criminal Intent and Paris 
Enquétes Criminelles. 


Paris Enquétes Criminelles Episodes Law & Order Episodes 

1. S01E01 Fantóme 1. S01E16 Phantom 

2. S01E02 Requiem Pour un Assassin 2. S01E04 The Faithful 

3. S01E03 Le Serment 3. S01E01 One 

4. S01E04 Addiction 4. S01E03 Smothered 

5. SO1E5 Scalpel 5. S01E09 The Good Doctor 

6. S01E6 Ange de la Mort 6. S01E07 Poison 

7. S01E7 Un Homme de Trop 7. S01E06 The Extra Man 

8. S01E8 Le Justicier de Ombre 8. SO1E11 The Third Horseman 


Source: own processing, 2021. 


5 Reading the data 
5.1 Level 0: Intercultural reading 


At this level, we perform a cultural comparison of emotional display in male and 
female characters in Law & Order and Paris Enquétes Criminelles. The data shows 
that emotion display follows a similar overall pattern (Figure 5). Characters pre- 
dominantly displayed facial expressions that convey sadness (45%, 40%), followed 
by neutrality, anger, happiness, fear, surprise and disgust (Figure 5). Table 2 mir- 
rors the graph in Figure 5; it highlights the fact that characters display a wider 
range of emotions in Paris Enquétes Criminelles than in Law & Order: Criminal 
Intent. The French appear to display more neutral features compared to the 
Americans (+2.7%), but they display more anger (+0.40%), more fear (+0.7%), 
more happiness (+0.8%), more disgust (+10%), and more surprise (40.1096). In 
turn, the American characters appear much sadder (+5.20%). 


Zooming in on Shot Scales — 169 


Table 2: Emotions display comparison between Law & Order and Paris 
Enquétes Criminelles. 


Emotions 
American French 
Sad 45,30% 40,10% 
Neutral 31,30% 34,40% 
Angry 13,80% 14,20% 
Happy 8,50% 9,30% 
Fear 0,70% 1,40% 
Surprised 0,40% 0,50% 
Disgust 0,00% 0,10% 


TOTAL 100,00% 100,00% 


Source: Own processing, 2022. 


Paris Enquétes Criminelles 
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Figure 5: Emotions display comparison Between Law & Order and Paris Enquétes Criminelles. 
Source: Own processing, 2022. 
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The data signal that both the French and American cultures display the same range 
of emotions, broadly speaking. They both follow the same trend of emotion display 
conventions in responding to crime, grief and judicial procedure. However, the 
data demonstrates that the French male and female characters are more emotion- 
ally expressive and show a wider range of emotions than their American counter- 
parts. This finding is congruent with Hall’s context cultures model, positing that 
high-context culture members rely on non-verbal communication and facial ex- 
pressions more than low-context cultures. France is considered a high-context cul- 
ture; accordingly, the French characters show a wider range of emotions than the 
American ones. The American characters, in turn, display substantially more sad- 
ness, which seems to emphasize the tragic nature of the show. 


5.2 Level 1: Media reading 


Level 1: Media Reading compares how shot scales play out in Law & Order: Crimi- 
nal Intent and Paris Enquétes Criminelles. Gender and emotion display elements in 
this part are disregarded to focus solely on shot scale and culture. We aim to find 
patterns to support and challenge the claims from the media theories by looking at 
shot scale distribution through the lens of technology, genre and cultural tradition. 

The data show that both American and French versions of the show follow 
the same overall shot scale distribution trend: MCU, followed by MS, CU, MLS, LS, 
CS, and BCU (Figure 6). Table 3 sheds light on the differences; The French use 
more MCU (23.896), CU (+ 1.496), CS (40.4096), LS (+2.20), and BCU (+0.40). The 
Americans rely more on MS (+6.7%) and MLS (1.6%). 

The fact that both versions of the show follow the same overall shot scale distri- 
bution pattern is significant. The data is congruent with the idea that TV series follow 
a similar trend based on Hollywood conventions. We find that Americans and French 
rely mostly on MCU and MS to portray characters in a crime show. When combined, 
MCU and MS account for 80% of the shot scale distribution in the US version and 
77% in the French version. Our findings do not align with Barry Salt's claims that 
close-ups dominate crime shows. In fact, close-ups represent a small portion of our 
data set. We suspect, however, that such a drastic difference might be related to the 
discrepancies in shot scale conventions and lack of common reference. Such a misun- 
derstanding calls for a standard framework like the one we propose here. 

Zooming in on the apparent similarity of the shot scale distribution trend, we 
observe that the French use a wider variety of shots (more MCU, CU, CS, LS, BCU). 
The French version, more recent than its American counterpart, exemplifies the im- 
pact of technological progress on aesthetic choices. This finding concurs with Bord- 
well's idea that smaller cameras enable more flexibility for directors. Likewise, the 
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cinematic look of the 16:9 aspect ratio in Paris Enquétes Criminelles might impact 
shot scale preferences, encouraging wider shots. Taking advantage of the ratio, the 
French use more LS (42.2096) to highlight the Parisian background. They also display 
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Figure 6: Shot scale distribution comparison between Law & Order and Paris Enquétes Criminelles. 
Source: Own processing, 2022. 


Table 3: Shot scale distribution comparison between Law & Order and Paris 
Enquétes Criminelles. 


Shot Scales 


American French 


MS 
CU 
MLS 
CS 


LS 
BCU 


TOTAL 100,00% 100,00% 


Source: Own processing, 2022. 
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more MCU (+3.80%) and more CU (+1.40%) than the American version (Table 3). This 
trend signals that the French explore closer shots, supporting Bazin's claim that the 
close-up could survive and even thrive with a wider aspect ratio. 

Finally, stylistic choices based on film techniques might explain the higher use 
of tighter shots (e.g., MCU, CU, BCU) in the French version. Digeon and Amin al- 
ready suggested that the French version is a hybrid version of the original show 
based on pace, music, and color. They observed that the pace of the French version 
was almost twice as fast as the American. Hence, the French show, using more 
MCUs and CUs, corroborates with Bordwell and Cutting, who equates shorter 
lengths with shorter-scaled shots (Bordwell 2006; 137; Cutting and Candan 2015: 56). 


5.3 Level 2: Feminist reading 


At this level of reading, we look at the representation of women on screen based 
on screen time, display of emotion and the camera gaze. First, we contextualize 
this section by calculating the screen time per gender per show to put our data in 
perspective. We find that male characters make up 79.496 of the gender displayed 
on screen, while female characters represent 20.696 in Law & Order: Criminal Intent 
(Figure 7). In Paris Enquétes Criminelles, male characters make up 78.596 of the gen- 
der displayed on screen, while female characters only represent 21.5%. 

Our findings reveal that female characters are overwhelmingly underrepre- 
sented in both the French and American versions of the show. This trend com- 
plies with Geena Davis Institute and the CSA's claim that men dominate screen 
time in film and TV series. However, the drastic gap between genders is wider 
than in more recent shows analyzed by the Institute. Such a contrast might be 
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Figure 7: Gender screentime in Law & Order and Paris Enquétes Criminelles. 
Source: Own processing, 2022. 
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symptomatic of shows from the late 1990s and early 2000s, where gender differ- 
ence on screen was not addressed as it is today. 

Therefore, we analyze the display of emotions between male and female char- 
acters as represented in both cultures. The data (Table 4) show that American male 
characters display more anger than female characters (+2.8%). In contrast, Ameri- 
can female characters show more sadness (+1.2%), more happiness (+2.2%) and 
more fear (+0.6%). The French male characters display more sadness (+2.30%), 
more anger (42.8960) and more disgust (+0.10%). In turn, French female characters 
appear more neutral (*2.30), happier (*2.4), more scared (+030%) and more sur- 
prised (+0.2%). 

Overall, the data show that both male and female characters in both versions 
of the show display a similar hierarchy of emotion display. Sadness is the domi- 
nant emotion for all the characters, followed by neutral, anger, happy, fear, sur- 
prised and disgust (Figure 8). This trend confirms emotions as a universal value. 

Nonetheless, we observe variations in the display of emotions at the gender 
level in both the US and French versions. Female characters show a wider range 
of emotions than their male counterparts in both cultures. Our data confirms 
Doane's claim that the display of emotion is also contingent on gender and that 
women show a wider range of emotions. While this is exact, the gap between 
male and female characters is narrower in the French context. Our findings sug- 
gest that the gap in emotion display between gender is culturally specific. Further 
investigation is needed to quantitatively measure the display of emotion gender 
gaps across cultures and establish the validity of such a statement. 

The underlying pattern also suggests that male characters in both versions of 
the shows display more anger than female characters. The data concurs with Pa- 
tulny et al.'s statement that “women tend to either suppress their anger or ex- 
press it by either crying, pouting or being unhappy, despite the fact that both 
men and women experience the feeling of anger with the same frequency" (Pa- 
tulny et al. 2019: 139). Indeed, the wider range of female characters' displays of 
emotions supports Patulny et al.'s claim. Our findings also match Geena Davis' 
claim that men are more violent in a show and women are more fearful. We can 
speculate that male characters' display of anger combined with female charac- 
ters' display of fear is representative of a crime show, in which men are most 
likely to be the perpetrator of the crime, displaying violent behavior, and that fe- 
male characters tend to be portrayed as victims (Digeon and Amin 2021). 

Pribram's close reading of films and TV series, relying on scenes and exam- 
ples, provides us with in-depth analysis. She appropriately describes the most 
common emotions in crime shows, namely anger and sadness (elicited by sympa- 
thy). Yet, focusing on the binary opposition of primitive emotions implies that 
other emotions are overviewed. Our multimodal empirical approach takes into 
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Figure 8: Emotion display comparison per gender between Law & Order and Paris Enquétes Criminelles. 
Source: Own processing. 
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Table 4: Emotions display comparison per gender between Law & Order and Paris Enquétes Criminelles. 


Law & Order Paris Enquétes Criminelles 

Male Female Male Female 
Sad 44,60% 47,80% 40,50% 38,20% 
Neutral 32,00% 28,80% 34,00% 36,30% 
Angry 14,40% 11,60% 14,80% 12,00% 
Happy 8,00% 02% 8,80% IIA 
Fear 0,60% 1,20% 1,30% 1,60% 
Surprised 0,40% 0,40% 0,50% 0,70% 
Disgust 0,00% 0,00% 0.10% 0,00% 


TOTAL 100,00% 100,00% 100,00% 100,00% 


Source: Own Processing, 2022. 


consideration the entire range of emotion display because they all contribute to a 
character’s representation. Quantitative data allows for distant reading by revealing 
trends and patterns to support objectively claims such as Pribrams’. Our findings 
underscore the importance of a close/distant reading approach to TV series. 

We have established that the French use a wider variety of shots than the 
Americans in Level 1 Reading. In the last step of Level 2 Reading, we zoom into the 
data and analyze the shot scale used by gender. This level informs us of the impact 
of shot scales on the representation of male and female characters in both shows. 

The data show an apparent similarity in the shot scale distribution trend. 
Figure 9 shows that all the characters, both gender and cultures included, are 
mostly depicted with MCU for more than half of the time, followed by MS for 
about a third ofthe overall shot scale framing. CU has a limited role in depicting 
characters as it only represents 4-6% of the general depiction of the characters. 
BCU, CS, MLS, and LS represent a fraction of this overall depiction. Table 5 high- 
lights the diversity of shot scale to depict French female characters. 

The results do not signal any significant camera gaze biases toward female 
characters. No intense focus on women’s depiction with a scrutinizing camera 
that sexualizes women has been observed. Consequently, no clear trend support- 
ing Laura Mulvey’s male gaze is in this show. 
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Figure 9: Shot scale distribution comparison per gender between Law & Order and Paris Enquétes 
Criminelles. 
Source: Own processing, 2022. 
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Table 5: Shot scale distribution comparison per gender between Law & Order and Paris Enquétes 
Criminelles. 


Law & Order Paris Enquétes Criminelles 
Male Female Male Female 
MCU 52,8096 56,2096 58,7096 51,6096 
MS 37,00% 33,40% 29,40% 33,10% 
CU 4,2096 SEGUI 5,7096 SE 
MLS 3,80% 3,20% 2,00% 2,80% 
cs 0,90% 1,00% 1,20% 1,80% 
LS 0,50% 0,80% 2,60% 3,60% 
BCU 0,80% 0,00% 0,40% 0,80% 


TOTAL 100,00% 100,00% 100,00% 100,00% 


Source: Own processing, 2022. 


5.4 Level 3: Psychological reading 


This level of reading is concerned with the underlying psychological processes that 
govern a director’s aesthetic choices and an audience’s emotional response to a char- 
acter’s display of emotions in correlation to shot scale distribution. We analyze the 
correlation of shot scale distribution with a display of emotions through cognitive film 
theory to shed light on the directors’ unconscious decisions and the audience’s affect. 
Earlier in this study, we have established that both versions of the show rely on a 
majority of MCUs followed by MS. We have also demonstrated that French characters 
display a wider range of emotions and that they are depicted with a wider variety of 
shot scales. This level of reading builds on these established results and offers a more 
detailed view of a character’s representation in the French and American context. 
The data reveals that both shows follow a similar shot scale distribution pat- 
tern when portraying emotions (Figure 10). The trend similarity implies there is a 
tacit rule of shot scale distribution to depict emotions that transcends a director’s 
conscious decisions. Such findings support Benini et al.’s idea that “statistical distri- 
bution of different shot scales in a film might be an important marker of a film’s 
stylistic and emotional character” (Benini et al. 2016: 16501). Both the French and 
American directors of the series use the same conventions to build tension and sus- 
pense. The French director did not fundamentally change the shot scales conven- 
tions, tacitly implying that the French audience is familiar with this global pattern 
of representation. They are already cognizant of the genre and the film techniques. 
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Figure 10: Display of emotions combined with shot scale distribution comparison between Law & 
Order and Paris Enquétes Criminelles. 
Source: Own processing, 2022. 
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We thereupon zoom into the data to achieve a more granular view by adding male 
and female characters’ emotion depictions combined with shot scales within and 
across cultures. Looking at the patterns of representations from a ToM perspective 
allows us to understand audience emotional engagement techniques. Earlier in this 
study, we established that female characters in France and the US display a broader 
range of emotions. We also demonstrated that they are depicted with a wider variety 
of shot scales. Building on this new knowledge, we seek a pattern of emotion display 
combined with a specific shot to achieve ToM. In lieu of describing all the data, we 
focus our attention on variations and seek outliers that might be significant. 

The results follow the patterns previously described, where all the characters 
are depicted with a majority of MCU and MS (Figure 11, Table 6). We find that MCU 
is the most prevalent shot scale to feature all the emotions. When featuring fear, 
this ratio increases for male and female characters in both versions of the show 
(78.1096 and 74.5096). This finding partially aligns with ToM, which states that a 
closer shot increases emotional response and more engagement from the audience. 
Though not as intense as a CU, an MCU shows a character from chest level up and 
still spotlights facial expressions. Hence, focusing on the characters' fear with an 
MCU certainly contributes to raising the psychological tension of the narrative and 
consequently transferring it to the audience. Along with fear, the display of sur- 
prise stands out in how it is featured. Surprise is also portrayed with a high amount 
of MCU in male and female characters in both versions of the show. Surprise, often 
a fleeting emotion, requires a close depiction of the face, focusing on displaying the 
emotion to the audience. The French version contains a greater quantity of CU and 
BCU proportionally to any other emotion. We notice that American male and fe- 
male characters display anger with a greater variety of shot scales than other emo- 
tions. Unexpectedly, anger is framed with MS, MLS, CS, and LS instead of MCU, CU, 
or BCU. Such information signals that anger is expressed with body language for 
both genders over facial expressions. In fact, CU is the least-used scale to show the 
emotion of American males (2.596). This finding suggests that an American audience 
might feel uncomfortable with a close shot of angry faces. 

The data does not show any significant attempt to feature sadness with 
tighter shots; it goes against Rooney and Bálint's principle that a sad close-up is 
most likely to trigger a stronger response than the neutral close-up (Rooney and 
Bálint 2018). It turns out that the French female character's neutral emotion is ac- 
centuated with a CU. In fact, the CU, which raises a disproportionate amount of 
attention from the multiple theories we review, plays a limited role in the repre- 
sentations of characters and the depiction of their emotion. 
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Figure 11: Emotion display combined with shot scale distribution comparison per gender between 
Law & Order and Paris Enquétes Criminelles. 
Source: Own processing, 2022. 
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Figure 11 (continued) 
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Table 6: Shot scale distribution comparison per gender per emotions between Law & Order and Paris 
Enquétes Criminelles. 


American Male 


Sadness Neutrality Anger Happiness Fear Surprise Disgust 
MCU 52,8096 57,40% 41,40% 52,70% 64,60% 75,00% 0,00% 
MS 37,10% 35,00% 44,60% 40,60% 30,30% 16,70% 0,00% 
CU 4,60% 4,60% 2,50% 2,80% 5,10% 6,20% 0,00% 
MLS 3,20% 2,30% 9,60% 3,00% 0,00% 2,10% 0,00% 
cs 1,30% 0,50% 1,10% 0,60% 0,00% 0,00% 0,00% 
LS 0,80% 0,20% 0,80% 0,10% 0,00% 0,00% 0,00% 
BCU 0,20% 0,00% 0,00% 0,20% 0,00% 0,00% 0,00% 


TOTAL 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 0,00% 


American Female 


Sadness Neutrality Anger Happiness Fear Surprise Disgust 
MCU 55,30% 59,00% 45,70% 62,40% 73,20% 66,70% 0,00% 
MS 32,70% 32,40% 43,70% 29,70% 24,40% 16,70% 0,00% 
cu 6,10% 5,90% 3,50% 3,20% 0,00% 8,30% 0,00% 
MLS 3,40% 1,70% 6,00% 3,50% 0,00% 0,00% 0,00% 
cs 1,30% 0,40% 0,70% 1,20% 2,40% 0,00% 0,00% 
LS 1,10% 0,60% 0,20% 0,00% 0,00% 8,30% 0,00% 
BCU 0,10% 0,00% 0,20% 0,00% 0,00% 0,00% 0,00% 
TOTAL 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 0,00% 

French Male 

Sadness Neutrality Anger Happiness Fear Surprise Disgust 
MCU 59,1096 57,60% 60,80% 54,50% 78,10% 62,90% 66,70% 
MS 29,40% 30,60% 27,20% 31,30% 12,80% 17,70% 33,30% 
CU 5,60% 6,70% 4,30% 4,40% 5,90% 9,70% 0,00% 
MLS 1,80% 1,80% 2,80% 2,90% 1,10% 1,60% 0,00% 
cs 1,40% 1,10% 1,30% 1,20% 0,50% 0,10% 0,00% 
LS 2,40% 2,00% 3,30% 5,10% 1,60% 4,80% 0,00% 
BCU 0,30% 0,20% 0,30% 0,60% 0,00% 3,20% 0,00% 


TOTAL 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 
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Table 6 (continued) 


French Female 


Sadness Neutrality Anger Happiness Fear Surprise Disgust 
MCU 52,80% 47,10% 57,40% 52,70% 74,50% 53,80% 0,00% 
MS 33,10% 37,10% 27,00% 29,10% 16,40% 30,80% 0,00% 
CU 5,30% 8,20% 5,90% 3,70% 3,60% 7,7096 0,00% 
MLS 2,70% 2,30% 4,40% 4,00% 1,80% 0,00% 0,00% 
cs 1,60% 1,80% 0,50% 2,40% 1,80% 3,80% 0,00% 
LS 3,60% 3,00% 3,20% 7,00% 0,00% 0,00% 0,00% 
BCU 0,90% 0,50% 1,60% 1,10% 1,90% 3,90% 0,00% 


TOTAL 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 0,00% 


Source: Own processing, 2022. 


6 Conclusion 


This pilot study introduces an empirical cross-cultural comparative analysis of 
transnational TV series adaptations. Showcasing 16 episodes of Law & Order: 
Criminal Intent and its adapted French version Paris Enquêtes Criminelles, we quan- 
tify characters’ screen time, display of emotion and shot scale distribution. Taking a 
deep learning approach to the data, we cross these elements to achieve greater gran- 
ularity. We zoom in and propose a layered, four-level reading of the data through 
the lens of intercultural models, media theories and feminist and psychological ap- 
proaches. We have established a shot-scale model based on AI technology to conduct 
our data gathering. Our approach proved effective in automatically labeling, classify- 
ing, measuring and comparing a large quantity of visual data. This study provides in- 
sights into the ethics of televisual representations of male and female characters on 
screen across cultures and the process through which cultural proximity is achieved. 

Our layered reading highlights different elements and enables us to confirm 
or challenge the theories we rely on. Level 0 Intercultural Reading reveals that 
the French characters display a wider variety of emotions than their American 
counterparts. It corroborates Hall's contexting model positing that high-context 
cultures, such as France, display more emotions than low-context cultures when 
interacting. Subsequently, in Level 1 Media Reading, we demonstrate that both 
the French and the US shows follow a similar shot scale distribution pattern, fea- 
turing mostly MCU and MS. This arrangement suggests the genre conventions are 
homogenous across the two cultures. Despite the similarities, we observe that the 
French version features a wider variety of shot scales than the American. Such 
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variations can be explained by the technological progress of screens and cameras. 
Lastly, the French use more CU, confirming the correlation of a faster pace with 
closer shots and leading to a hybrid genre. Next, Level 2 Feminist Reading reveals 
the drastic inequality of female characters’ screen time in both shows. The find- 
ings confirm that the display of emotions is contingent on culture and gender. 
Part of a high-context culture, the French female characters display a broader 
range of emotions than American female characters. Interestingly, men in both 
versions display more anger than female characters; in turn, female characters 
display more fear than their male counterparts. These findings align with the pre- 
vious research led by Digeon and Amin. The study shows no trend depicting the 
sexualization of women based on shot scale, challenging Laura Mulvey’s male 
gaze. Lastly, Level 3 indicates an underlying psychological process that rules the 
shot scale distributions. This trend can result from the directors’ unconscious ap- 
plication of genre conventions and audience expectations. This reading highlights 
minor differences in representation between genders. 

Our methodology and toolkit, the Möbius Trip, contributes to digital humani- 
ties research methods. The comparative study could be extended to genres, peri- 
ods, directors, themes and across cultures. It offers excellent potential for further 
application in different fields. It has the potential to contribute to the study of TV 
series, impact feminist film theories, and contribute to psychological research. To 
do so, we need an exhaustive corpus of TV series and more data to truly achieve 
Large-Scale Granularity. 
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Zoom-out, and Zoom-zero Modes 

to Understand Religious Sculptures 


Abstract: This paper is devoted to the significance of the scale and physical di- 
mensions of cultural heritage objects for their study and reuse in virtual environ- 
ments. The paper focuses on digitization and virtual representation of religious 
sculptures from the Perm Art Gallery. The gallery’s collection includes wooden 
sculptures of gods and saints, various decorations from churches, and a large 
carved iconostasis in the late baroque style, which is 25 meters high. The paper 
focuses on the analysis of mediation through Zoom-in, Zoom-out and Zoom-zero 
modes for the study of cultural heritage by objects recontextualization and recon- 
struction of these contexts and addresses the issue of authenticity of digitized her- 
itage. The conclusion discusses the optics of the objects’ representations as well as 
the limitations and benefits to variate the scales and coordinates. 


Keywords: digital representation, cultural heritage, wooden sculpture, research 
optics, authenticity 


Adaption of optics is an important component of successfully carrying out com- 
plex research. The ability to get closer to see the details and move away at a maxi- 
mum distance provides a complete picture of the phenomenon under study. 
Digital means provide a basis for tuning research optics to obtain high quality 
results. Regarding cultural heritage, digitization predefines quality of data for the 
subsequent use and performs as a core-pillar in high grade digital representation. 

Digitization of cultural heritage objects refers to various issues such as docu- 
mentation of cultural heritage (Gladney, 2007), dissemination of knowledge (Cima- 
domo, 2013), publication in the electronic environment, providing public access to 
heritage (Ruthven and Chowdhury 2015), building infrastructures (Povroznik 2018), 
analysis of the current physical state of objects by restorers and conservation spe- 
cialists (Uueni et al. 2017), interpretation of digital heritage (Rahaman 2018), ex- 
panding the possibilities of studying objects based on information technology, and 
reusing objects in creative industries and beyond (Terras 2015), etc. Each of these 
tasks requires adjustment of optics to interact with the objects in a digital environ- 
ment; the focus of our current project is on religious wooden sculptures. 

Orthodox religious wooden sculptures form a part of material culture and 
are necessarily tied with intangible heritage referring to the historical and cul- 
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tural features of the region. The specifics of the spread of Christianity in the Perm 
region lies in the later development of these processes compared to Western Eu- 
rope. The main reason is the slow colonization of the territory of the Kama region 
by the Russian population (Makarov 2009). During the period of active construc- 
tion of Christian churches, a specific phenomenon expressed in the Orthodox 
wooden sculpture was formed. The wide distribution of wooden sculptures in the 
north of the Perm region dates back to the seventeenth-nineteenth centuries. It 
can be assumed that sculpture has become a kind of ethnic and spiritual-cultural 
identification of the local population in the circumstances of Orthodox coloniza- 
tion, reflecting the complex processes of the cultural and spiritual turn (Vlasova 
2010). Therefore, the multilayered character of religious wooden sculpture has to 
be taken into account in the project. 

The collection of religious wooden sculptures, which is preserved by the Perm 
Art Gallery, is the largest and most complete in comparison with similar collections 
of other cultural heritage institutions in Russia (Fond of Perm wooden sculpture 
2022). In general, the collection at the Perm Art Gallery differs from other similar 
museum collections in that most of the exhibits have a detailed description, includ- 
ing the place of existence before appearing in the museum, which creates funda- 
mental opportunities for studying the distribution and localization of sculpture in 
spatial, quantitative and qualitative dimensions. The collection of religious sculp- 
tures inherently complements the iconostasis of the Transfiguration Cathedral in 
Perm. The iconostasis has an obvious cultural and historical value, as one of the 
oldest preserved carved wooden iconostases in the Perm region. The current proj- 
ect has been undertaken to digitize a part of the collection of religious sculptures of 
the Perm Art Gallery and the iconostasis of the Transfiguration Cathedral. 

The digital space allows users to view electronic objects and creates condi- 
tions for human interaction with them. This is especially important in cases 
where objects are fragile or interaction with them in the physical environment is 
impossible. In the case of wooden sculpture and iconostasis, both of these restric- 
tions are valid. This paper is devoted to the possibilities of adapting optics in a 
digital environment as an introduction to the religious sculptures, their compre- 
hensive study and further implementation of other projects involving digitized 
artifacts. The paper discusses the features of user interaction with religious sculp- 
ture in Zoom-in, Zoom-out and Zoom-zero modes, and shows the advantages of 
each of the modes to obtain the most effective results for the study and reuse of 
cultural heritage. 
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1 Background 


Perm religious wooden sculpture is the treasure of the Perm region and one of the 
iconic collections that shapes its brand. The Perm Art Gallery has the largest collec- 
tion of such Orthodox sculptures, which currently consists of 501 items, and many 
of them are recognized as masterpieces of art and are widely exhibited in Russia 
and abroad (Vlasova 2010). The collection of wooden sculpture is diverse in terms 
of chronology (the exhibits date back to the late seventeenth - early twentieth cen- 
turies), by types of objects (there are widely represented sculptures of saints, gods, 
persons from biblical scenes, relief and sculptural images of angels, cherubim and 
seraphim, crucifixes, complex sculptural compositions), according to the technique 
of execution (the sculptures were created by representatives of different schools of 
carving, art workshops and individual authors), according to iconography (differ- 
ent subjects and motifs are presented). 

Iconostasis is a large wooden partition with several lines of icons placed in a par- 
ticular order. It is aimed to separate altar from the rest part of the Orthodox church 
(Tradigo 2006). The unique carved wooden iconostasis which is under discussion in 
this paper was created in the eighteenth century in the village of Pyskor in the north 
of the Perm province. It includes the large curved frame luxuriously decorated in 
barocco style, painted icons and Holy gates (Vlasova 2011). It was brought to Perm at 
the beginning of the nineteenth century. For this valuable iconostasis, the Transfigu- 
ration Cathedral in Perm was built in the early nineteenth century. 

In the 1930s, during the period of propaganda of atheism by the Soviet gov- 
ernment, religion was under pressure. Churches were destroyed, often being 
blown up or dismantled, or the purpose of buildings could be reassigned to hospi- 
tals, warehouses and even stables. With regard to the Transfiguration Cathedral 
in Perm, similar measures were also taken and the cathedral building was seized 
from the Russian Orthodox Church and given to the Perm Art Gallery for exhibi- 
tions. To expand the space for the gallery activities in the building, the main hall 
of the cathedral was transformed from one large floor to three floors of newly 
created exhibition spaces. For this reason, two additional floors surrounded with 
walls were constructed. One of the walls was built close to the iconostasis. The 
distance between the wall and the iconostasis is about 1 meter. It significantly lim- 
its the access to the iconostasis for viewing. At the level of the first and second 
floors of the newly created constructure, the observation of the iconostasis is 
practically impossible due to the very narrow space remaining between the wall 
and the iconostasis. We can only appreciate the beauty of the upper tier of the 
iconostasis from the third floor. In general, in the physical space of the cathedral, 
the view of the iconostasis is significantly limited and digital technologies are pur- 
posed to partially solve this problem to provide access to the masterpiece of cul- 
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tural heritage via digital means. To open the iconostasis to the public, such as re- 

searchers and other audiences, the digitization of the iconostasis and the further 

publication of materials in an open virtual environment is necessary. 

At present, the process of restoring the building is underway. In the 2010s, it 
was decided that the Transfiguration Cathedral, where the Perm Art Gallery is 
now located, would be transferred to the diocese of the Russian Orthodox Church. 
The process of relocation of the gallery was delayed due to the fact that artworks 
require a special place to store and exhibit, which was difficult to ensure in a 
short time. However, the gallery will move location from the cathedral and estab- 
lish its activities in another building. 

In the current exhibition, the unity of religious sculpture placed on the third 
floor of the gallery, the iconostasis and the space under the dome of the cathedral 
form a unique exposition environment, conveying one common idea. The iconos- 
tasis and the exhibition of sculptures in the cathedral together created a unique 
atmosphere and organically complement each other, in a sense revealing the con- 
texts in which the sculpture existed. 

In the new premises, when the gallery leaves the cathedral this unity will be 
broken, since the iconostasis will remain in the cathedral and the sculptures will 
*move' along with the other gallery's collections. The religious wooden sculpture 
will be exhibited in new spaces according to a different curatorial idea. There- 
fore, digital technologies are aimed to solve a whole range of tasks, including: 

1) digital preservation of the exhibition space on the third floor with an exposition 
of wooden religious sculptures. At the moment, a virtual tour of the exposition 
of wooden sculptures has already been implemented and it is freely available 
online (Virtual tour on the exposition of Permian wooden sculpture 2022); 

2) providing opportunities for viewing the entire iconostasis based on the crea- 
tion of a 3D model and its publication in a virtual environment; 

3) expanding the possibilities of interaction with individual sculptures and their 
study by creating 3D models of a part of a collection of wooden sculptures 
and publishing them online. 


2 Digitization 


At the moment, photographs of sculptures with descriptions and short metadata 
are published on the gallery's website in the public domain (Foundation of Perm 
wooden sculpture 2022). Each page of the digital collection in the virtual gallery is 
dedicated to an object and contains one or more photographic images with the 
possibility to Zoom-in at about 5096. Along with the frontal image, separate photo- 
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graphs with details of the sculpture are presented in some cases. That is, the tech- 
nologies of photographing and presenting images in 2D used to be involved in the 
digitization processes. Thus, this format of the Zoom-in mode had been partially 
implemented to examine some features of the object. 

An initiative project undertaken to further digitize the iconostasis and religious 
wooden sculptures, described in more detail in this paper, uses laser scanning, pho- 
togrammetry and spatial modeling technologies. The implemented methods and 
technologies are aimed to complement and deepen the possibilities of reuse of the 
collection. To digitize the sculptures and to create 3D models, photogrammetry 
technology has been selected as a main instrument. It provides an opportunity to 
create a fine three-dimensional model based on photographing an object. The ad- 
vantage of this technology is the detailed reproduction of the texture of the object 
in a color scheme very close to the original and building a high-quality geometric 
shape of the sculpture. 

The experimental digitization and creation of 3D models of the gallery’s objects 
started with the selection of the sculptures that are meaningful from different points 
of view. As part of the initiative project, it was important to obtain a relatively small, 
but at the same time diverse digital collection of religious sculptures. As a result, it 
included sculptures from different themes and schools of iconography. The selected 
objects are sculptures of saints, Jesus Christ, the Mother of God, angels and Bible 
characters. The sculptures are made with different techniques and belong to differ- 
ent territories of the Perm region. The 3D models of the selected sculptures have 
been published online (Center for Digital Humanities 2022). Diversity is necessary to 
test the possibilities of representing sculpture in the virtual space in order to deter- 
mine the best approaches to digitization and create the final information resource. 

Moreover, the choice of objects for digitization was influenced by technical limita- 
tions associated with the photogrammetry technology itself. For example, due to these 
restrictions, those objects that were predominantly monochromatic, black, lacquer, 
shiny, containing transparent elements were not selected for digitization. In addition, 
since the photogrammetry technology is aimed to create 3D models of visible surfaces; 
objects with complex interior structure were not involved in the work as well. These 
limitations are specific to the photogrammetry technology and require development 
of the individual solution based on the selection of a combination of technologies. 

The digitization of the iconostasis is a much more complicated process due to 
the fact that physical access to the iconostasis is blocked by constructions of So- 
viet times as mentioned in the beginning of the paper. The distance from the built 
wall to the iconostasis itself is about a meter, which makes it difficult for the visi- 
tor to access it for viewing, and also negatively affects the possibilities of digitiza- 
tion. To create a 3D model of the iconostasis, laser scanning technology was 
involved. It is widely in use to digitize places of worship and their fragments (Bar- 
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rile et al. 2017). The current project is designed to solve the technologically more 
complex problem of creating the 3D model of the iconostasis. Due to the lack of 
direct access to the object and the presence of barriers (walls) that block access to 
the iconostasis, it is required to digitize the iconostasis in parts. 

The upper part of the iconostasis is visible for the visitors at the top level of 
the current exhibition devoted to the wooden religious sculpture and it is de- 
picted in Figure 1. 


Figure 1: The photograph of the exhibition on the religious wooden sculpture with an upper part of 
the iconostasis as a background. Photo by Nadezhda Povroznik. 
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The Figure 2 demonstrates the lower part of the iconostasis hidden behind the wall. 


Figure 2: Point cloud of the lower part of the iconostasis obtained by laser scanning. Screenshot of 
the point cloud was made by Nadezhda Povroznik. 


The digitization of the iconostasis is carried out on the basis of laser scanning of 
separate fragments of the object, followed by the merging of elements. Laser 
scanning of the iconostasis was carried out by specialists from the Faculty of Geol- 
ogy of the Perm University. Fragments of the iconostasis were scanned using a 
high-precision scanner. Then, the point clouds were connected, the texture was 
improved, the illumination and colors of the images were aligned on the dark- 
ened areas of the iconostasis, which were closest to the wall. 

The created 3D models of the sculptures and the iconostasis are currently 
stored on local computer drives and they will be published on the website of the 
Perm Art Gallery in the near future. Digital facsimiles represent the physical ob- 
ject in maximum degree of detail and quality. High-resolution copies provide an 
opportunity to document the current state of the object in order to trace its decay 
and undertake necessary restoration measures. However, digital facsimiles re- 
quire a lot of resources from digital platforms to be placed on. Therefore, it is 
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essential to choose the optimal resolution of objects and the method of their re- 
presentation. Consequently, simplification is required for publication of objects 
online. The choice of how to best represent 3D objects depends on the technologi- 
cal component, the capabilities of different platforms for publishing objects of a 
certain physical and digital sizes, among other factors. 

At the stage of digitization, a key factor is the setting up the viewing capabili- 
ties of objects, particularly the Zoom-in and Zoom-out options. This is dependent 
on the quality of digitization, high resolution capture of images provides a clearer 
detailed view using the Zoom function. 


3 Representation and authenticity 
in the digital space 


Analysis of the possibilities of representing objects in a virtual environment is re- 
lated to rethinking the opportunities of how objects will be used and how inter- 
acting with them will be carried out. However, the representation of objects 
online must be implemented by the ability to adapt the optics of viewing objects. 

Interaction with digital objects is carried out through digital platforms where 
they are published. Objects can be viewed from different angles, zooming in or 
zooming out depending on the needs of the user and according to the capabilities 
and limitations inherent in the functionality of the platforms. That is, the ability to 
adjust the optics is determined, in addition to the quality and depth of digitization, 
also by the means that are used to represent objects in the digital environment. 

The publication of 3D models is rarely carried out on project websites di- 
rectly. More often they are getting connected with these websites through special 
exchange platforms. The use of third-party services instead of a direct publication 
on their own servers are on demand due to the fact that the significant volume of 
the required space to host 3D models can impede the operation of the website 
and unnecessarily overload the server. Exchange platforms remove many con- 
straints, such as the number of published models. They also provide additional 
options for customizing the environment in which the models are going to be pre- 
sented. Another advantage of such platforms is its multidisciplinarity, the publica- 
tion of content from different fields, directions and topics attracts a variety of 
audiences and increases discoverability of the objects. Such a popular platform is 
SketchFab (SketchFab 2022). 

There are several important features of the SketchFab platform that posi- 
tively affect the ability to view the published objects and interact with them by 
Users: 
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1) the publication of each 3D model is accompanied with the ability to add a 
description of the object and tags associated with the object, topic, time, mate- 
rial and select a set of categories to which the object belongs to increase dis- 
coverability and ensure interconnectedness with the digital environment; 

2) implemented open data options for free download of the model by users; 

3) an environment has been developed for setting the display mode of an object, 
adjusting the location of an object along the X / Y / Z coordinate axes, selecting 
lighting, background, tools for additional texture editing and other post- 
production options; 

4) tools have been developed to customize the viewing of an object in virtual 
reality (VR) and augmented reality (AR). 


The SketchFab environment hosts 3D models of the wooden sculpture (Center for 
Digital Humanities 2022). The hotspots with annotations will be created soon to 
highlight the specifics of the objects and other options, including VR and AR. 

The SketchFab platform has advantages over other similar platforms, but is 
focused mainly on placing individual small to medium sized 3D objects. At the 
same time, setting up the display of an object also requires basic knowledge in 
graphics and optics, since it is necessary to select the relevant demonstration 
modes, including depth of field to minimize distortion and exhibit the digital ob- 
ject as similarly as possible to the real object. 

Architectural constructures, large sculptures and sculptural compositions can 
also be published on the SketchFab platform, but their representation will not be 
optimal for viewing and user interaction with them. Zoom-in mode implemented 
on the SketchFab for observation of the object is optimal for medium and rela- 
tively small objects. For large objects, the platform limits the ability to zoom in, as 
the viewpoint in the maximum Zoom-in mode penetrates the mesh of the object 
into the void of the interior space, which is not simulated. That is why publication 
of the iconostasis’ 3D model requires a different approach. The dimensions of this 
masterpiece are such that placement on the platform mentioned above is not ap- 
propriate due to the object being too small in size on the user screen, plus limited 
navigation options. Zooming-in and viewing fragments in good resolution are im- 
possible on this platform. 

The dense point cloud of the iconostasis has been published for public access 
on the PointBox platform (3D model of the iconostasis 2022). It is possible to 
zoom-in and zoom-out to observe the object. However, the platform limits the res- 
olution and the size of the point cloud which makes it almost impossible to ob- 
serve the details of the iconostasis closely. 

The ArtStation platform (ArtStation 2022) will be used to place the 3D model 
of the iconostasis, since it makes it possible to create entire studios dedicated to 
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the object or their combination. This platform implements a different approach, 
which embodies the possibility of using content of different formats, creating a 
united environment for representing a specific topic. In addition, the undoubted 
advantage of the platform is the ability to display content in high resolution, 
which is important for the representation of digital cultural heritage. 

Undoubtedly, both platforms make it possible to make cultural heritage more 
accessible, more interactive and allows users to create new information products 
(for example, virtual galleries and exhibitions) connecting their environment and 
the projects websites. 

At the same time, the digitization and publication of objects in a virtual envi- 
ronment raises questions related to the authenticity of the object (Fickers 2021). 
The authenticity of digital facsimiles is influenced by a large number of factors 
such as the digitization technologies used and the selected equipment, software 
that is used for the processing and visualization of models, etc. The content creator 
can influence the listed factors, as well as verify the resulting digital copy, check it 
out for compliance with the original object (including using a color palette) and 
designate the physical dimensions of the object. That is why there is a need to dis- 
cuss digital facsimiles, framing them as critical facsimiles (Dahlstróm 2019). 

However, there are many issues that are beyond the control of the professio- 
nals who create the digital product. First of all, such an issue is the equipment and 
software that is used by users of the digital content. The digital environment inter- 
action with the object is carried out through a computer or another gadget. There- 
fore, the quality of the reproduced object, seen by the viewer, is affected. The 
creator of the resource is not able to influence the displays, as well as to track the 
difference of a digital object when it is played on different monitors. This circum- 
stance is true for screens of various devices, as well as individual settings for 
brightness, contrast and flicker. The factors that are between the observer and the 
digital object in a virtual environment affect the authenticity of the object, but they 
are out of the control by the developers of the digital content. In addition, the indi- 
vidual perception of images on a computer or mobile device screen also has an im- 
pact on the ability to observe the content (Bimber and Hainich 2016). 

The aforementioned circumstances influence not only the quality of the ob- 
jects’ representation or their perception in the virtual environment, but also 
bring to the materiality of the digital objects spirit of a *weak surrogate" (Ireland, 
Bell, 2021). 
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4 Adapting optics, scales, and zoom modes 
for comprehensive study 


The ability to configure optics to interact with digital objects and use them in dif- 
ferent directions depends significantly on scaling. In the analog world scale can 
be represented in the following main perspectives (Goodchild 2001): 

1) implication of level of spatial detail; 

2) representative fraction; 

3) spatial extent; 

4) process scale. 


In reference to wooden religious sculptures and the possibilities of studying 
them, these scales are applicable and determine the opportunities for using 
Zooming modes. A sculpture can be measured in its physical dimensions and de- 
fined in terms of space (length, width and volume). It can be represented on a 
time scale (from creation to the present moment). The sculpture has a variety of 
contexts of existence and surroundings (inside the church, in the museum and 
between them). It is represented in a different physical condition, which is espe- 
cially important for restorers. Sculptures also have a geographic scale for repre- 
sentation, since they have different origins and can be positioned according to 
this parameter on geographic maps. An analysis of the origin and localization to- 
gether with other features makes it possible to classify sculptures and conduct 
comparative studies. 

Digitization and virtual representation of cultural heritage helps to select the 
scale and fine-tune the distance between the viewer and the object defining the 
optics, switching from the detailed views to the general patterns and vice versa. 
The scaling in digital environment can be implemented in the Zoom-in, Zoom-out 
and Zoom-Zero modes. In a certain sense, combining the modes in this project is 
consistent with multiscale approach to digital heritage (Pepe et al. 2020). 


5 Zoom-out mode 


The analysis of the spatial representation of the sculpture at the maximum dis- 
tance from the researcher is mediated by the map and carried out in Zoom-out 
mode. At this distance, it doesn’t imply the object as it is, rather the data about it. 
The study of the origin and localization of the sculptures is based on metadata, 
description of origin and geographic locations. Understanding of data requires 
digital tools to collect, organize and process data. Geoinformation technologies 
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make it possible to study the localization and distribution of sculptures to identify 
common and specific features of the objects for localities. Adjusting the optics 
and involving Zoom-in mode helps to analyze the origin in conjunction with the 
features of the sculpture such as its types, the appearance of the saints, distribu- 
tion by topics and symbolism of objects and their elements. In this case, the 
image of the object becomes an important source of research development and 
its complication. Some sculptures of saints and especially Christ expressed visible 
similarity with the local nationalities (Vlasova 2006). For example, the faces of the 
religious sculptures from the territory of the Komi-Zyryans have wide prominent 
cheekbones, which were characteristic of the local population. A potential study 
on the correlation of the specificity of sculptures, their localization on provenance 
and anthropological characteristics of the local population inevitably requires 
Zooming-out for a generalized view of metadata and Zooming-in for comparative 
analysis of the facial features. 

In the digital space, the most common mode for representing large and me- 
dium-sized objects is in the Zoom-out mode. In virtual space, viewing opportuni- 
ties are limited by the size of the screen. To see the entire object on the screen, 
the Zoom-out mode is used, which significantly distances the object visually from 
the viewer. The size and quality of a display of a mobile device or a personal com- 
puter monitor shape a limited framework for interaction with an object. The real 
size of the object observed in virtual space becomes unobvious because the 
human eye sees an object reduced in size compared to its real physical size. So, as 
the user gets closer to the object scrolling in, it is difficult to determine where the 
Zoom-out mode switches to Zoom-zero and Zoom-in modes. 

The Zoom-out mode has obvious advantages because it allows users to ob- 
serve the shape of an object in general, to see some of its general features. With 
regard to architecture, for example, only in the Zoom-out mode can one track the 
features of the structure of an object, classify it according to the specifics of the 
shape and organization of elements. This circumstance is essential for the analy- 
sis ofthe iconostasis in our case study. 

The iconostasis of the Transfiguration Cathedral in Perm has impressive di- 
mensions — 22 meters in height (with a cross on the top of the iconostasis, the 
height is 25 meters), about 15 meters in width. The iconostasis has three tiers and 
contains 21 designated places for icons. Along with the icons there are paintings 
on biblical subjects. Only 19 images out of 21 have survived, two pictures were 
lost over time. They are the icons of the Mother of God and the Evangelists Mat- 
thew and Mark. The general idea and structure of the iconostasis can be seen in 
the Zoom-out mode. The dense point cloud of the 3D model of the iconostasis has 
been published online (3D model of the iconostasis 2022). Viewers can assess the 
scale ofthe masterpiece of wooden architecture, see and understand the general 
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logic of space organization, analyze the hierarchy of plots depending on the 
canon and their location in the iconostasis. 

Also, in the Zoom-out mode, it is possible to assess in a general way the inter- 
nal space of the cathedral in combination with interior items, primarily in rela- 
tion to the iconostasis, the central object in the cathedral. 

In the physical space of the cathedral, the view of the iconostasis is difficult 
due to the narrowness of the space and the closeness of the walls to it, as previ- 
ously mentioned. It is impossible not only to see the iconostasis as a whole, but 
also to appreciate the harmony of the space of the cathedral and the masterpiece 
of architecture. In a virtual environment, new perspectives are opened up for the 
implementation of such visualization and representation of contexts. It becomes 
possible to model the space of the cathedral in the condition before the installa- 
tion of the walls, to open an overview of the entire iconostasis and the space in 
front of it. According to the preserved visual historical sources such as photo- 
graphs, plans and sketches, it will be possible to recreate the space of the cathe- 
dral for the period before the reconstruction via digital means. 


6 Zoom-zero mode 


The Zoom-zero mode is useful for a deeper immersion of the viewer (researcher) 
into the contexts. The religious sculptures were often carved to human height, 
were realistic in terms of anatomy and proportions. A context for the sculptures 
was created within the church, which could be changed depending on religious 
holidays. Sculptures were an important part of the rituals and activities in the 
church. The sculpture and sculptural compositions had effect on parishioners of 
the church and make them feel like witnesses to biblical events. On various reli- 
gious holidays in churches, sculptures could be draped in clothes. The surround- 
ings of the sculptures could be changed by lighting candles and placing them in a 
certain order around the sculptures, creating a play of light and shadow. 
Recreating the environment of existence of objects via digital means will allow 
users to interact with objects on a qualitatively new level, and the Zoom-zero mode 
will provide an opportunity to see the object as an eyewitness to events. 
Reconstruction of the space in the digital environment where the sculptures 
and the iconostasis existed requires a whole complex of historical sources. In this 
regard, it is critically important to know the dimensions and proportions of the 
space of churches and parishes, where wooden sculptures were brought from, 
their interior to restore a realistic image of the environment that surrounded the 
sculpture and what context existed in its natural existence. Unfortunately, only in 
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relation to a small part of the sculptures, visual sources have been preserved in 
photographs depicting the places where they were located in the natural environ- 
ment, located in churches. According to these photographs, preserved descriptions, 
with memories of collectors, it is possible to partially reconstruct the space in digi- 
tal format to expand the possibilities of analysis of the objects in the digital space. 

The reconstruction of the natural environment of these objects is needed to 
recreate the authentic experience of the visitor at the churches. In doing that, it is 
necessary to use special virtual environments where the problem of scale can be 
addressed. Such virtual environments can be shaped on the basis of digital plat- 
forms for publishing well-documented and annotated 3D models online and com- 
plex environments such as virtual reality (VR). 

The Zoom-zero mode can only be achieved as a presence effect, when a per- 
son can correlate the object and himself, noticing the size of both. In a sense, this 
can be achieved with the help of VR technologies. Modern platforms for repre- 
senting 3D objects and VR allow the users to set up the optics and switch between 
modes according to the user’s tasks. It enables them to set the optimal scale 
adapted to specific objectives. In the SketchFab environment, you can set up both 
the display of an isolated object in a digital environment and scale its representa- 
tion in VR and AR. Where the physical characteristics of the size will make sense, 
there will be something to compare it with and the user will see the real size of 
the object, authentic to the original. Moreover, the height of the viewer can be 
easily taken into account by changing settings in the software or hardware. For 
example, most of the VR-glasses include a set of settings with various height char- 
acteristics to adapt the VR content to the physical size (height) of the viewer 
(player) to make the experience more realistic. 

In some sense, implementation of the Zoom-zero mode in creation of VR- 
content echoes the film-making process where producers tend to avoid using zoom. 
The “Dolly Zoom” (Vertigo Effect or Hitchcock Zoom) was popularized by Alfred 
Hitchcock and then applied for quite specific scenes in filmmaking (Hitchcock’s Re- 
released Films 1991). As an artistic effect it is used for achieving the particular psy- 
chological effect, attracting attention of the viewer to certain details, visualizing 
implicit processes, etc. However, this effect causes significant distortion and flatten- 
ing the image (Vermeulen 2018). That is why the modern film industry prefers 
video-capturing with no zoom, thus achieving the needed effect with a camera posi- 
tion selecting distance to the object. In virtual reality, using zoom may cause dis- 
comfort to the viewers due to the unrealistic camera movement or specific optical 
effects which distort perspective and disorientate observants. 

Additionally, when using VR in Zoom-zero mode, it becomes possible to see 
the space of churches and wooden sculptures through the eyes of parishioners of 
the past. It is important to note that the sculptures occupied different places in 
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the church, placed at different heights. For example, they could be put on pedes- 
tals, in wall niches, placed on the floor of the church near the altar or in other 
spaces. The position of the sculptures is significant not only in terms of the hierar- 
chy of saints, for example, but also in the fact that certain features of the sculp- 
ture can only be seen in their authentic location. Consciously or unconsciously, 
the sculptures of saints had visual characteristics that made them similar to the 
local population. For some of the sculptures, this resemblance to representatives 
of the Komi population is visible with the unaided eye. Other sculptures require 
the recreation of an environment, a certain angle of view, natural to its position 
in the church, in order to see this similarity. 

To look with more depth, the time of year and the time of day for such a re- 
construction is also important. Recreation of natural lighting such as light break- 
ing through the glass of windows, the light of oil lamps and candles is also a 
necessary condition for the reconstruction of the authentic environment in which 
the object existed. 

However, for most sculptures, placement in a natural context can be quite lim- 
ited and a visual reconstruction of the authentic environment is problematic due to 
the lack of historical sources. Many sculptures were brought from distant churches 
and there is no mention of where the sculptures were located, what significance 
they had for the decoration of the church, whether they were constantly in the 
space of the church or were brought on holidays for decoration. It is known that in 
some cases the sculptures were removed from churches to hide them for varied 
purposes. The sculptures were not always created according to the canon, they 
were often worshiped as deities, which is in tune with pagan religions. 


7 Zoom-in 


The Zoom-in mode is necessary to view the object in detail, going closer to it and 
observing features and attributes of the items more meticulously. The Zoom-in 
mode in relation to the iconostasis provides an opportunity to see the images in 
the frames which represent biblical scenes, analyze in detail the iconography and 
combinations of colors used, which is especially important for comprehensive re- 
search of the iconostasis as a piece of art (Vlasova, 2006). Therefore, it opens up 
new perspectives for studying the object moving from general observation in 
Zoom-out mode closer and closer to the elements of the whole iconostasis. 

With regard to manufacturing techniques and threaded parts of the iconosta- 
sis, digital technology creates additional opportunities for studying it. Figure 3 
shows a fragment of a dense cloud of points extracted from the array of iconosta- 
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Figure 3: Fragment of a dense cloud of points created on the basis of laser scanning of the 
iconostasis. Screenshot of the point cloud was made by Nadezhda Povroznik. 


sis scans. The dense point cloud is a set of pixels captured in 3D modeling, each of 
the points has a number of certain coordinates that helps to orientate the object 
in space. In the picture we can see fragments of carvings, some of which are of a 
repeating nature on other parts of the iconostasis. Analysis of the dense point 
cloud and its parts will help to classify fragments, identifying regularities and pre- 
dict possible repetitions of the lost parts of the ornamental carving (Tari 2016). 

In addition, digital means can be used to complement the main body of the 
iconostasis with the separated parts. Some fragments of the carving have fallen 
away from the iconostasis and became independent items of storage in the gal- 
lery such as Cherub in an ornamental frame, for example (Cherub in an orna- 
mental frame 2022). Other elements were lost, their partial restoration is possible 
on the basis of studying the repeatability of decorative elements as it was men- 
tioned earlier referring to the relevant study (Tari 2016). The connection of the 
separated and restored fragments with the body of the iconostasis is getting 


Adapting the Optics — 205 


achievable with help of digital technologies and subsequent representation in the 
virtual environment. 

Zoom-in and Zoom-zero modes allow users to see novelty in sculptures and 
open up perspectives for an interdisciplinary approach to analysis. Research at the 
intersection of medicine, art and the humanities expands the possibilities of artistic 
interpretation of sculpture. Interpretations can be made of certain features of the 
sculptures from a medical point of view, which will deepen the context and under- 
standing of the sculpture by the viewer. Felipe C. Cabello’s research on the Isen- 
heim altarpiece provides a definite direction in the medical interpretation of 
religious painting and sculpture (Cabello 2018). 

Since some of the sculptures are made in a high degree of realism, in the 
Zoom-in mode, it is possible to analyze in detail the ways in which the context of 
events is represented in the sculpture. According to the author, the character in- 
carnated in the sculpture was an important person of the plot. For example, the 
sculpture of Christ in the dungeon (Seated Savior 2022) is part of the plot of the 
Passion cycle, which was widespread in plasticity of wooden sculptures in the 
eighteenth-nineteenth centuries. The gospel story associated with the sculpture 
describes the finding of Christ in prison before ascending Golgotha. According to 
the plot, Christ was beaten in the face with sticks. When looking closely at the 
sculpture, you can see what artistic techniques were used, what colors and palette 
the author chose to depict injuries and bruises. 

Applying medical interpretations, contextualization can be significantly deep- 
ened and we can address the sculptures through the plot and events that hap- 
pened to Christ. One of these details for interpretation may be the bulging belly 
depicted by the author. The artist expressed the emaciated body of Christ, the 
pale color of the skin, which has deathly hues and greenish and yellowish shades 
indicating extreme exhaustion, and blood under the crown ofthorns, etc. Analysis 
and medical interpretation of these details will significantly complement cultural 
interpretations, explain and verbalize the sensations conveyed by the sculpture. 

Information technology and Zoom-in modes make it possible to visualize de- 
tails that are hidden to the unaided eye and to deepen the interpretation of the 
sculpture. Having a full-fledged 3D model of an object, it is possible to extract the 
mesh to analyze the shape of the sculpture. Mesh is a structural layer of a 3D 
model which represents the geometry of the model and consists of polygons. Sep- 
arating mesh from the coverage textural layer helps to consider the shape of the 
object separately from painted surface. The pigments hide the bottom layers and 
in that way some important details that also require attention remain invisible. 

Laser scanning is a very precise technology to capture and measure the 
shape of the object. It helps to build the geometry of the object (mesh) very pre- 
cisely for the subsequent visual representation. Analysis of the extracted mesh is 
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able to disclose details that convey the realism of the sculpture, physiology, natu- 
ralness and the carving techniques. 

By separating the geometrical and textural layers of the sculpture using digi- 
tal means and analyzing mesh and color individually, it becomes possible to learn 
more about the technique used by the sculptor. In the photograph of the sculp- 
ture of the Seated Savior (Figure 4) we can see the colors of the surface, but the 
curving technique itself is not so obvious. Figure 5 shows the mesh of the sculp- 
ture in high amount of polygons. 3D laser scanning shows excellent results in 
spite of the imperfect coverage of the scanned surface (the missed spots are indi- 
cated in green on the Figures 4 and 5). Artificial illumination of the mesh high- 
lights the curves of the body, the relief of the muscles, emphasizing the strength 
of the emaciated body. Also, it is feasible to conduct comprehensive research to 


Figure 4: Photograph of the sculpture of Christ in the dungeon (Seated Savior). Photo by Nadezhda 
Povroznik. 
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Figure 5: Mesh of the sculpture of Christ in the dungeon (Seated Savior) created using laser 
scanning. Screenshot of the mesh was made by Nadezhda Povroznik. 


understand effects that were achieved by the creator through wood carving and 
what was emphasized with paints. 

Zoom-in mode helps to dive deeper which can be implemented on the basis 
of additional technologies, including tomography and X-rays. On the basis of 
these technologies, it is possible to penetrate inside the sculpture and reveal 
structural features hidden from the human eye. An X-ray study of the sculpture 
was already carried out in the 1980s. In the 1980s, wooden sculptures were ana- 
lyzed using X-rays to study the technology of creating a sculpture, the internal 
system of fastenings and the internal state. These technologies were involved to 
analyze the physical state of wooden sculptures, to detect the presence of cavities, 
to trace changes in physical condition and to determine on the basis of the results 
of analysis the required restoration and preservation measures. Each sculpture is 
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a complex construction that may consist of dozens of parts. The system ofthe con- 
nections of these parts from the outside view is also not distinguishable (e.g. nails 
and cotter pins remain invisible for the unaided observers, see Figure 6). In addi- 
tion, the problem of preservation is associated not only with the state of the paint 
layer but also with internal lesions and defects. The radiographs obtained as a re- 
sult of the study can be considered as pieces of art. Special attention is needed to- 
wards the conclusions made by radiologists regarding the interpretation of images. 
Despite the fact that the objects under study had inventory numbers, Christian ter- 
minology was used by the radiologists in the conclusions such as “X-ray ofthe hand 
of the Savior” or “X-ray of the thigh of the Virgin”. Subsequently, the images from 
the X-ray procedures were exhibited in the gallery as independent objects of art as 
part of the “Towards the Light" program (The Perm Gallery will show . . . 2015). 


Figure 6: X-ray of the sculpture. Saint Nicholas of Mozhaisky. 19th century. 1980. The study was 
conducted by Dr. Professor A.I. Novikov. 
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8 In conclusion 


The digitization of cultural heritage should be carried out on the basis of pre- 
conceived possibilities for scaling the project and laying the foundation for a diverse 
representation of objects. We should address the formation of a special digital envi- 
ronment, an infrastructure in which it is possible to adjust the optics and switch 
from mode to mode, from Zoom-out to Zoom-in through Zoom-zero modes and back 
according to the particular goals. That is, such a property as the adaptability of a digi- 
tal resource for specific purposes should be implemented. This approach will ensure 
switching research optics from the analysis of details to local and even global gener- 
alizations. On the basis of adapting optics, the possibility of tuning in, it becomes pos- 
sible to deepen the interpretation of sculptures and their contexts. Scalability of the 
elements in such a digital environment enables the opportunity to combine the 
Zoom modes to understand the same topic from different distances. 

Furthermore, the adjustment of optics is essential for audiences whether re- 
searchers or virtual visitors of the gallery that will interact with digital cultural 
heritage. It is important for the cultural institution to create and tell stories and 
expose them to viewers, to different audiences. This is also essential for research- 
ers to discover something new, previously unexplored, to ask new questions, to 
dig into the contexts deeply, to offer an interpretation of the details and the way 
to put together a holistic picture. 

The invention of such a digital environment that comfortably switches be- 
tween Zoom modes inevitably faces numerous challenges. Many of them are di- 
rectly related to the digital realm, including the authenticity of sources in the 
digital medium, computer-mediated experience, screen-mediated interaction and 
others that shape the limits of our understanding of heritage. 

Digitization of the objects and reconstruction of the environment alone is not 
enough to understand all the cultural layers and contexts of existence of the 
sculpture. It is necessary to know the traditions, local customs, which often dif- 
fered from canonical events due to the late Christian colonization of the region 
and the rooting of pre-Christian customs, mixing and symbiosis of Christian and 
pagan cultures. Also, there is an important topic for the further discussion which 
relates to religious ethics in the digital environment. 

The next step after the digital reconstruction of the iconostasis is planned to 
restore the interior of the Transfiguration Cathedral, to show how it looked like at 
the beginning of the 20th century prior to the installation of additional floors and 
walls for the gallery’s exhibitions. Also, it is planned to retrieve the space behind 
the Holy Gates. In Orthodoxy, there is a religious ban on visiting the space behind 
the iconostasis by women. In relation to this prohibition, the issue of the religious 
ethics in the digital world arises as an important topic for subsequent discussions. 
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© Scale Exercises: Listening to the Sonic 
Diversity in 5000 hours of Swedish Radio 
with Computers and Ears 


Abstract: This article explores the significance of scale within the field of audio 
analysis. The introduction of digital signal processing methods is today enabling 
large-scale processing of recorded sound, which in turn provides access to vast 
amounts of unexplored audiovisual data. It is now possible to zoom the sounds of 
our past. In order to highlight both affordances and limitations of these new 
methods, this article studies 5000 hours of Swedish radio from the 1980s. By 
adopting computational tools from bioacoustics, linguistics and musicology it be- 
comes possible to study trends and developments in the acoustic style of broad- 
casting. This provides insight into the changing characteristics of public service 
media in the era of de-monopolization. However, to achieve these insights, the 
historian needs to practice the sonic scales. 


Keywords: radio history, audio analysis, sound studies, signal processing, media 
studies 


1 Introduction 


However, based on this material, it is not possible to say anything about the general character 
of the individual radio channels, nor about any specific aspect of content (Äberg 1996: 17). 


This rather pessimistic note concludes a methodological summary written by the 
Swedish media scholar Carin Aberg 1996. Her ideas, though radical, have re- 
mained on the periphery of media studies ever since. She was running up against 
a problem of scale. Inspired by the work of German radio scholar Detlef Schróter, 
Aberg wanted to approach the medium, not like an information channel, but as a 
matter of design. Radio was simply the total sum of sounds that people wanted to 
hear. This meant that radio had to be studied beyond the textual and discursive 
realm, as “sound in time” (Aberg 1999). However, studying the actual flow of 
radio sounds quickly revealed itself to be a very laborious and difficult task. Man- 
ual coding was not only time-consuming, it also posed issues of precision. This led 
Äberg to conclude that “there just aren't any tools available for sound analysis 
this date. Visual and textual media rules not only science, but society at large — 
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and the methods, and theories for understanding sound are simply lacking” 
(Äberg 1996: 18-19). 

The following article seeks to return to Äbergs discouraging conclusion. Her bat- 
tle with the problem of scale predicts our contemporary position. Recent humanities 
scholarship related to sound has again addressed the problem of scaling audio data 
(Bull and Cobussen 2021). Yet today, it is possible to engage the issue through differ- 
ent means. By transposing these questions into the realm of signal processing, my 
work aims to redeem some of the methodological ideas of Äberg, whilst at the same 
time situating audio data within the debates of digital scholarship. My argument is 
that digital audio renders zooming not only a possibility, but as a necessity, in a way 
that re-actualizes Äberg’s ambition. Upon tapping into the vast information aggre- 
gate of digital sound archives, the scholar unavoidably shifts through a variety of 
points of perspective, calling for a self-reflexive approach to *digital hermeneutics" 
(Clavert and Fickers 2021). In contrast to textual and visual information, the media 
we use to scale sound-related data constantly involves multiple, interlocking modali- 
ties. In order to not remain unreflective around this epistemological eclecticism, it is 
necessary to return to the question of scale in regard to audio data. The following 
chapter analyzes a set of audio data by means of variation computational methods, 
zooming in and out on a dataset of 5000 hours of Swedish broadcasting from 1980 
to 1989. By means of signal processing, the analysis explores trends and variations 
in the data on four different scale levels. The overarching research question con- 
cerns the status of sonic diversity within the data and how it develops over time. 
Diversity was a guiding principle for Swedish Broadcasting throughout the last deca- 
des of the twentieth century. Prior research studied the results on an organizational 
level, but the actual sonic content remains unexplored. My analysis maps several 
aspects of diversity within the audio and tracks the development over time. In doing 
this, the purpose is both to demonstrate the affordances and limitations of digital 
audio, as well as to contribute to the understanding of the sonic development of 
broadcasting media. The process is intended to demonstrate the capacities of digital 
audio, and is, in this sense, a sort of media archeological experiment on the level of 
signal content (Fickers and van den Oever 2019). 


2 The scales of audio data 


Sound has regularly been diagnosed as a peripheral modality in humanities re- 
search (Thompson 2002, Smith 2004, Sterne 2011). Nonetheless, there have been 
significant changes in the way sounds are processed, stored and analyzed since 
the turn of the millennium. For almost two centuries, digital formats harbor 
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larger amounts of the total human sonic cultural heritage than all analogue 
media combined. This is driven both by the escalating output of new digital con- 
tent, as well as by extensive digitization projects across sound archives all over 
the globe. This radically changes the conditions under which culturally produced 
sound can be studied. No longer are we dealing with material objects and feeble 
radio transmission. Instead, the radio scholar and anyone else with an interest in 
sounds are confronted with a vast repository of preformatted information. These 
strings of acoustically interpretable data are condensed as digital audio files, al- 
lowing the researcher to ask entirely new questions about the history of sound. 
Simultaneously these very files pose new questions back to the researcher. 

This media theoretical feedback loop begs the question of what audio data 
really is, and to what degree it is different from other types of data. As Wolfgang 
Ernst has pointed out, in terms of digital storage, there is only data. The computer 
makes no fundamental distinction between textual and sonic content (Ernst 2013). 
Nevertheless, audio data pertains specifically to a certain modality of human per- 
ception, posing a set of more or less unique difficulties. This article suggests that 
audio data, as information captured about, and intended to render acoustic phe- 
nomena, ought to be distinguished empirically in at least three significant ways. 
All three aspects pertaining to the matter of scale, and thus, the article aims to 
provide an introduction to the multi-scalar problems and prospects of audio anal- 
ysis. Whilst the subsequent analysis in this paper demonstrates these aspects, this 
first section provides a more general overview of the argument. 

There is no unmediated approach to the study of recorded sound, just differ- 
ent scales of mediation. This might at first glance appear as an oxymoronic state- 
ment. However, there is a firm tradition of distinguishing between machinic and 
human analysis, the same way there is a strong emphasis on the difference be- 
tween qualitative and quantitative approaches. Both these distinctions indicate a 
methodological dis-entanglement which digital media does not allow. There is no 
access to digital audio which is not fundamentally quantitative and machine- 
aided. There might be elements of what we associate with human hermeneutics, 
but the very access to the sound file is an instance of complex calculation. It is 
not only a case of time-axis manipulation, allowing for playback and reversal — 
digital sonic mediation always takes place under the regime of signal processing 
(Kittler 2017). Information becomes manipulable and self-generating. The order of 
bits can be reorganized however fits the question. Instead of a recorded sentence 
of speech, we can listen back to all the consonants in alphabetical order, or only 
the breaths in between. This means that the supposed sound object under study 
decomposes into a tangible, zoomable meshwork of information (Ernst 2016). It 
remains the responsibility of the researcher to consider the stages of mediation 
with a certain amount of “self-reflexivity” (Clavert and Fickers 2021). 
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There also is no absolute ‘close’ or ‘distant’ approach to audio data, only the 
interaction of several related scale steps. This is not a problem exclusive to audio 
data. In text-oriented research, “scale” has been considered the “single most im- 
portant issue” of digital transformation (Hayles 2012: 27). The conclusion, accord- 
ing to N. Katherine Hayles, is the task of finding “bridges” between distant 
analysis and close reading (Hayles 2012). Despite attempts to translate this vocab- 
ulary into the sonic realm, the media condition under which sound is processed 
in computers does not support such a distinction (Clement 2019, Mustazza 2018). 
Whilst the basic element of a text can be reduced to a symbol or a letter, proper 
‘close listening’ would entail listening to phonons, the absolute quant level of 
sonic vibration (Arrangoiz-Arriola et al. 2019). The contemporary standard of 
pulse-code modulation, which informs most digitally stored sound, sets a limit by 
sampling the sound wave 4000 times per second. Staying properly close to the 
digital sound would thus result in many thousand observations for every second 
of speech. The same can be said regarding the frequency register of sound. Much 
of the vibrational forces we refer to as acoustical phenomena take place both 
above and below the perceptual spectrum of human listening. When sonic arti- 
facts are the subject of study in the humanities, it is seldomly considered beyond 
the spectrum of the audible and perceptible. This means that digitally stored in- 
formation about sound waves does not allow for the tropes of distance and close- 
ness, which have pervaded digital scholarship. Rather, the sound scholar engages 
with a medium that represents sound through several, relatively anthropocentric, 
interrelated scale steps. 

Finally, there is no purely monomodal approach to audio data. Zooming on 
sound takes place through different scales, in visual and auditory realms inter- 
connectedly. The audio data being subject to analysis is the result of physical 
sound waves entering a digital information processing system, but *unlike tabu- 
lar data and image data, it does not follow a very clear and organized structure” 
(Smallcombe 2022). Whilst images and text are well-developed and deeply inte- 
grated into today's machine learning technologies, the messy waves of sound re- 
main somewhat elusive. This further motivates a careful distinction between 
audio data and other modalities of storage. Whilst there are experimental efforts 
in the field of data sonification, researchers rarely find the analytic necessity to 
transform images or text into sound. In contrast, sound constantly and repeatedly 
passes the threshold to the image. This can be regarded in a long tradition of 
acoustic analysis. Already in 1927, philologist Alois Brandl could suggest that the 
human ear was inferior to the eye in the study of speech (Ernst 2016: 115). Sonic 
material is not only an object for our listening. It constitutes a multimodal experi- 
ence. This complicates things even further, because in the process of scaling 
sounds, the visual domain must be considered simultaneously. 
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To summarize, all audio analysis takes place on a scale involving both human 
and machinic participation. Through the interdependence on signal processing, 
several scale levels in the data are engaged simultaneously. Audio data further- 
more engages the multi-scalar dimension of visual and sonic representation. With 
this in mind, sound-oriented research must develop further sensibility towards 
its epistemic object, advancing a specific type of “digital hermeneutics” (Clavert 
and Fickers 2021). This requires both technical know-how, as well as the capacity 
to reflect on, and represent the methods of inquiry. Though partly overlapping, 
the matter of audio scalability departs from the standards of visual and textual 
analysis. As the remainder of this article aims to demonstrate, understanding the 
interrelated scales of audio analysis requires concrete practical] experience. Ini- 
tially, the analysis shifts between several levels of frequency-based analysis, 
studying both harmonic shifts in the entire audio data and specific noises from a 
more granular perspective. The final section of the analysis shows how frequency 
scale levels can be complemented with rhythmic pattern recognition, focusing on 
the time domain, rather than the frequency domain in the recordings. Such an 
endeavor requires scale exercises. 


3 5000 hours of radio 


The Swedish media database constitutes the exemplary model of a digital sound 
archive, ripe for multi-scalar exploration. Sweden was one of the first countries to 
apply a rule for legal deposit designated for broadcasting content. In the late 60s, 
plans for a new archival institution were under discussion in the Swedish library 
sector (Snickars 2015). This was supposed to be a radically modern archive. The im- 
pulse to capture the development of mass media was not unique, but the Swedes 
displayed a level of ambition that was rare at the time. At the annual IASA (Interna- 
tional Association for Sound Archives) conference in 1975, the subsequent head of 
the broadcasting archive Claes Cnattingius proudly declared this radical stance; 
modern media [. . .] like radio [. . .] contain important information, which should 
be preserved to the same extent as written material" (Cnattingius 1975: 27). Accord- 
ing to the official register at IASA, Sweden was amongst the first countries in the 
world to take on this daunting task and the result is one of the most extensive 
broadcasting archives in Europe. Since 1979, all radio broadcasted by the Swedish 
public service is recorded and preserved at the National Library. 

Today, the majority of the collection is digitized, constituting a vast repository 
of digital information. This analysis samples roughly 5000 hours distributed over 
the decade between 1980 and 1989. The material consists of 15 randomized com- 
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plete week-days of broadcasting from each channel, every year, each one consist- 
ing of roughly 18 hours of sounds. This amounts to about 500 hours of sampled 
data every year. This is of course a small sample, in comparison to the totality of 
the broadcasting material, yet it multiplies the sample size used in Äberg’s study 
by 50, which should prove sufficient for experimental results. 

The sample is collected evenly from two separate channels within the Swed- 
ish public service monopoly, P1 and P3. The two channels were at the time com- 
peting over the major audience base. P1 is the ‘flagship’ and the first channel in 
Swedish broadcasting history. P3 was introduced in 1964 and was supposed to 
offer a more youthful appeal. The introduction of P3 was partly an attempt at cre- 
ating variation and competition within the radio monopoly. At this point, there 
were diverging views on public service broadcasting and the risk of a monotone 
output drove organizational reconfiguration. Where many other countries re- 
solved this situation by the gradual introduction of commercial alternatives, Swe- 
den would sustain the radio monopoly until 1993. Up to that point, however, the 
matter of variation was in focus on the public service agenda. This is also the con- 
text within which Carin Äberg was writing. Her research reacted against what 
she perceived as a reductive embrace of media diversity, or as her contemporar- 
ies sometimes referred to it; “relative entropy" (Äberg 1996). Picking up Äbergs 
critical approach to radio diversity, this analysis pertains to diversity in the signal 
content itself. Whether the last decade of measures for increased variation within 
the monopoly was successful or not has been up for debate. Prior research has 
studied this late-stage radio monopoly through organizational and audience- 
based perspectives, but far less attention has been paid to the actual sonic con- 
tent. By deploying questions related to sonic design and the expression of radio, 
as proposed by Åberg, it is possible to investigate how the last decade of the pub- 
lic service monopoly sounded. 

The analysis follows in four stages, each employing a new set of algorithmic 
methods in order to extract and represent different aspects of the audio data. 
There is no overall method, but the employed techniques are all discussed and ex- 
plained subsequently in each chapter. The methodological approach consciously 
borrows methods from other fields of research in an experimental manner. Though 
there is great virtue in developing methods from scratch within digital humanities 
research, there is also a need for sustainability and reuse. Therefore, it is important 
to consider how digital scholarship can build on previously established work. The 
ambition of the following analysis is to maintain critical reflection around the 
methods of algorithmic processing, whilst simultaneously expanding on the histori- 
cal results. 
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4 Sonic diversity in the total data set over time 


On this initial scale step, the data presents itself as thousands of hours of unseg- 
mented signal values. The task is to consider how the question of diversity can be 
considered on an abstract level. One answer comes from the field of computa- 
tional bioacoustics. This area of research is focused on machine-aided analysis of 
acoustic communication amongst animals. It is an endeavor that entails working 
with large sets of data. Bioacoustics was thus also quick to adopt digital process- 
ing techniques into its toolbox. Today, there is a rich repository of methods within 
the field, from which humanities research surely can learn. In fact, in the specific 
interest of measuring species, there have been specific tools developed to explore 
sonic diversity in large data sets. A common method is the *audio diversity index" 
(ADD. A computational approach to ADI was suggested by Farina Pieretti in 2016 
(Pieretti et al. 2016). This method of measurement is featured in the approved 
computational bioacoustics library Scikit-maad (Ulloa et al. 2021). By processing 
visualizations of the audio content, the algorithms are trained on finding acoustic 
events. These events are collected and compared in accordance with Simpson's 
diversity index. The applied form is that of ‘Simpson’s reciprocal index’, where 
the outcome value simply indicates the total amount of different sonic events 
which the algorithm can detect within the file. The translation of this method to 
human-produced cultural data is essentially experimental, yet the results merit 
attention. Figure 1 displays the average values from each year in the sample data, 
distributed between the two channels. 
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Figure 1: The average Acoustic Diversity Index from the total data of each year in the sample set, 
plotted chronologically. The ADI groups sonic events into species, and measures the level of variety. 
Both channels display a gradual increase in ADI, with P1 demonstrating a higher level of change. The 
X-axis display the year in the data set and the Y-axis is the average Acoustic Diversity value in the data. 
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Taken at face value, these results indicate a clear increase in diversity within both 
channels. The algorithm manages to detect 21 different types of sounds within the 
average P1 broadcasting day in 1980. Over the decade this appears to increase grad- 
ually until the end of the decade which renders a total average of 30 different 
sounds. Parallelly, the development on P3 appears similar, if slightly more moder- 
ate. P3 starts out with a comparatively high average of 29 different sounds, but 
ends the decade with a slightly lesser increase, amounting to a value of 37. As it 
appears from the sample data, both channels thus increase their repertoire of sonic 
types. However, P3 begins at a level of diversity that is only achieved by Pl towards 
the end of the decade. The result can thus be understood as a testimony of a clear 
difference in variation between the two channels. Yet, such an interpretation re- 
mains speculative, and the precise number of estimated sounds ought to be consid- 
ered with caution. It is rather the estimation of the algorithms based on one specific 
manner of segmenting and comparing spectrogram visualizations. However, since 
the same method has been applied throughout the data set, the relative values are 
arguably still comparatively valuable. Thus, the rather clear trend towards a higher 
number of different sounds is not entirely speculative in nature, but could provide 
an indication of the general development of the sonic content. 

This is however not the only way to measure the overall character of the data. 
A more signal-oriented approach, which has been applied in bioacoustics and 
computational musicology alike, is the measurement of 'Audio Complexity Index' 
(ACD). In contrast to the diversity index, this analysis does not pertain to sonic 
events but instead measures the variation between short-time segments. Complex- 


m P1 
300 
E P3 
250 
200 
150 
100 
50 
0 
1980 1982 1984 1986 1988 


Figure 2: The average Acoustic Complexity Index from the total data of each year in the sample set, 
plotted chronologically. ACI is a measurement of amplitude diversion in segmented parts of the 
frequency spectrum. The results demonstrate no significant variation over time, or between the 
respective channels. The X-axis displays the year in the data set and the Y-axis is the average 
Acoustic Complexity value in the data. 
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ity is in this sense refereeing to the amplitude variation within certain parts of the 
frequency spectrum. The results pertain to the degree of variation over time in sep- 
arate parts of the frequency register. It is thus a type of measurement less oriented 
around specific events and instead tuned towards the dynamic contrast over time. 
Figure 2 plots ACI values, in the same manner as the previous plot. 

There isn't a direct relationship between the diversity index and the complex- 
ity value in the data files. On the contrary, the complexity levels do not display any 
clear tendency at all. In the sample data from both channels, the average values 
vary irregularly from year to year. The highest scores in the analysis are found in 
P1 in the year 1983. It is however not so simple that one method provides a more 
reliable answer, instead, they supplement each other. The complexity measure- 
ment might appear more brute, but has for example been successfully applied 
within musicological scholarship, in ways that have granted new knowledge about 
the variation and unpredictability in popular music (Pease 2018). Prior research 
has also suggested that comparison between two or more indices produces more 
balanced results (Lopes and Machado 2019). Where the diversity index provides an 
indication of the total heterogeneity in the sonic data, the complexity value gives 
insight into the amount of variation over time. In the case of Swedish broadcasting, 
there appears to be a certain increase in the diversity of sonic events, which never- 
theless are not reflected in the dynamic over time. It is thus clear that the matter of 
sonic diversity can be far from exhausted on this level of scale. Instead, the results 
compel us to study the audio data at a different level of granularity. Up until this 
point, it remains undisclosed what kind of sounds actually featured in the analysis. 
In order to better understand the tendencies demonstrated in these figures, it is 
crucial to consider the characteristics of the broadcasted content. Thus, the follow- 
ing section will change the perspective to the level of acoustic object recognition. 


5 Audio content as a historical indicator 
of radio style 


In the vast ocean of signal values constituting a sound file, there reside the ingre- 
dients for what human, cultural listening understands as certain culturally defin- 
able phenomena or objects. The machine-aided identification of such objects is 
often referred to as ‘acoustic object recognition’. This is a form of processing that 
relies heavily on pre-trained models, moving the analysis one scale step closer into 
the granularity of the data. Sound, transformed into spectrogram image, is scanned 
for visual cues associated with certain sounds. There are today several such models 
available, all with different purposes and capacities. None of them constitute a 


222 —— Johan Malmstedt 


“one-size-fits-all” solution, instead, the tool has to be tuned to the purpose (Tsalera 
et al. 2021). The following part of the analysis will employ the ‘inaSpeechSegmenter’ 
toolkit. It was developed as part of the European Union's *Horizon 2020 research 
and innovation" program and is intended to be specifically applied to mass media 
content (Doukhan et al. 2018). In comparison to other models available, it has a 
rather limited categorization variety but compensates with high accuracy. The limi- 
tation is in this case also quite well adjusted to the goal. InaSpeechSegmenter classi- 
fies audio according to merely four categories; female speech, male speech, music 
and noise. The approach might appear restricted, but in fact, it ties in with a long 
tradition in radio studies. Since the work of Rudolf Arnheim, there has been a long 
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Figure 3: The total distribution between *Music", *Speech", and *Noise" in the content, displayed as 
percentage. The upper graph displays the results from P1 from each year in the sample set and 
exhibits a gradual reduction of music. The lower graph contains data from P3 and, though 
demonstrating overall higher levels of music, does not indicate any clear change over the decade. 
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line of media theory that considers the semiotic fundamentals of radio to be reduc- 
ible to precisely music, speech and noise (Arnheim 1933). Aberg herself applied this 
taxonomy in her analysis of radio structure (Aberg 1999). Thus, in Figure 3, the per- 
centual distribution of these three categories is plotted for each year respectively 
throughout the decade. 

The most striking difference between the two channels appears to reside in the 
distribution of music and speech. Already at the beginning of the decade, P3 con- 
tains almost four times the amount of music compared to the sibling channel P1. 
That P3, was, and still is, more music-oriented in its content is generally accepted 
and known. Yet by this type of analysis, it is possible to get a grasp of to what de- 
gree the two channels incorporated music and how it changed over the decade. 
Where the data from P3 ends at almost the same level as it started, P1 demonstrates 
a clearer development. The initial 16 percent of music gradually gives way to even 
more speech content. The result confirms prior media historical research which 
has proposed that music was effectively reduced in P1 broadcasting throughout the 
decade (Forsman 2014 and Weibull 2018). Despite not being a novel discovery, the 
accordance arguably lends credence to the methodological approach. Nevertheless, 
these results can provide nuance to previous media historical depictions. The redis- 
tribution of content between the channels has in prior research been considered as 
a “migration” of music, from P1 to P3 (Weibull 2018). ‘Migration’ implies a direct 
relationship and communal economy of content within the monopoly, resulting in 
an increase of music in P3 relative to the reduction on P1. 

This appears however not to be the case in the data analyzed in my study, 
which indicates a more independent process. The distribution between music and 
speech seems to display a different tendency on P3, with declining values of both 
throughout the middle of the decade. Towards the end of the decade, the values 
return to the previous levels. There is, in fact, an explanation for this curious ten- 
dency. In further attempts to diversify the radio monopoly, local broadcasting 
channels were established all over Sweden throughout the 80s. In 1987, a new, 
entirely separate channel was added to the list, hosting the local content corre- 
sponding with each region. However, up until then, local broadcasting was given 
airtime on P3, which meant that throughout parts of the day, the content of P3 
was controlled by several local stations. Since every region had its own output 
during these hours, the archive registers this as an absence of content in P3. This 
brings to mind the archival factors at stake in this pursuit. The very way in which 
the material has been archived will always be hardcoded into the results. Never- 
theless, it is still possible to determine that the spoken and musical content ap- 
pear to diminish to an equal degree throughout the period, thus indicating an 
unvaried distribution. The two channels seem to have influenced each other soni- 
cally, but not necessarily in the direct sense of content migration. 
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By combining these values with the results from the previous part of the 
analysis, it is possible to further nuance the issue of sonic diversity. As concluded 
in the previous section of the analysis, P3 exhibited significantly higher variation 
in sonic content. It would be tempting to connect the distribution between music 
and speech to the level of sonic diversity. Yet, the results seem to refute this expla- 
nation. By comparing the chronological development of music in P1 broadcasting 
against the index values in the prior part of the analysis, the assumed correlation 
between musical content and sonic diversity is thrown into question. Only consid- 
ering the P1 data, music seems to have a negative effect on diversity value, yet 
combined with unchanging results of P3, the conclusion is rather an absence of 
any necessary relation between the category of music and sonic diversity. 

This reveals less about the general nature of music and more about the nature 
of computational audio analysis. For the computer, at least, the musical content 
played on Swedish radio at the time, does not display any significant difference in 
sonic expression. The matter of sonic diversity remains elusive on this scale level. 
It begs the question of what resides within these encompassing categories. This is a 
good opportunity to direct attention to the third content category; noise. Due to the 
statistical underrepresentation of the noises in the material, it would be easy to 
overlook, yet the category posits an interesting challenge to the matter of sonic di- 
versity. Less conceptually transparent than the categories of speech and music, 
‘noise’ contains most sonic events which fall outside the scope. In order to better 
understand the diversity values and the overall sonic profile of Swedish radio, the 
following segment will explore the granular level of this category. 


6 Granular analysis of radio noise 


Figure 4 represents the diversity in a sample of noises from the P1 throughout the 
decade. The visualization is produced with the human-in-the-loop tool ‘edyson’, 
developed at the Swedish Royal Institute of Technology (Fallgren 2014). The tool 
has certain methodological similarities with the diversity index employed in the 
initial part of the analysis, yet allows for more flexibility. The audio stream is seg- 
mented into bins, which are then processed visually. The plot is generated by 
mapping the similarity of each bin according to the technique of principal compo- 
nent analysis (PCA). It can thus be considered as a representation of the sonic 
breadth within the category of noise. Furthermore, the ‘edyson’ tool is for an in- 
teractive environment, allowing for aural exploration of the results. By zooming 
in on the noise and listening back to each group of bins, it is possible to identify 
certain key features within the cluster. I was able to manually detect four general 
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Figure 4: Visualization of the audio content in the category “noise”, by means of the ‘edyson’ 
module. A sample of noises in the P1 1980, 1983, 1986, and 1989 data sets have been disassembled 
into two-second segments and arranged according to similarity. The outcome can be interpreted as 
a map of the sonic features in the data where a more condensed cluster indicates more 
homogenous acoustic content. Color-coding has been done by manual listening and designate four 
categories of sound. Blue nodes are segments that contain sound effects and jingles, not 
determinable as music. Red nodes are crowd sounds and other human, non-speech noises. Green 
nodes correspond to environmental sounds and dark nodes are technical noises. 


types of noises, which are represented by the color-coding. The green area corre- 
sponds to sounds with plausible animal origin, or associable with a natural ambi- 
ance like leaves in the wind or waves. The red area contains different human 
sounds, hands clapping or the intermingled voices of a larger crowd. The dark 
area is composed of noises that are mainly media-related, like hisses and loud 
electronic noise. The larger blue area is composed of sound effects, mostly con- 
sisting of one particular bell sound, announcing the start of the news. A compari- 
son with the noises of P3 broadcasting, presented in Figure 5 reveals a structural 
difference. 

Though only composed from a small sample, we can still speculate on the re- 
lationships between the difference in cluster shape and the character of the con- 
tent. The P3 data posits less uniformity, with a broader area of distribution. 
However, in contrast to the case of P1, manual coding revealed that most noises 
in this data were made up by jingles and other sound effects of musical nature. In 
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Figure 5: Visualization of the audio content in sample from the category “noise” in the P3 sample 
sets,1980, 1983, 1986, and 1989. Color-coding has been done by manual listening and designate 
three categories of sound. Blue nodes are segments that contain sound effects and jingles, not 
determinable as music. Red nodes are crowd sounds and other human, non-speech noises. Dark 
nodes are technical noises. 


comparison to the environmental noises of P1 which have finer variations, these 
sound effects utilize much larger areas of the frequency spectrum. The result is a 
wide distribution, with noticeable outliers in P3 noises. The identification of 
sound effects and jingles has historical significance. Prior research has located 
how the regional alternatives appearing throughout the 1980s incorporated these 
techniques from American and British commercial broadcasting (Forsman 2000). 
My results seem to indicate that this sonic style also had an impact on P3 broad- 
casting. In order to further grasp this tendency, Figure 6 measures the acoustic 
diversity index in the noises of P1 and P3 throughout the decade. 

The results from this measurement indicate that levels of diversity depend 
heavily on the noise category. In P3, noise can be interpreted to compose more 
than 90 % of the total diversity. There appears to be a positive relationship between 
the distribution of sounds within the noise category and the average acoustic diver- 
sity. P3 exhibits a high variety of sounds that are not identifiable as speech or 
music, thus increasing the diversity of sonic events. In contrast, P1 develops a more 
nuanced noise register, yet without the general distribution of P3. Furthermore, it 
is important to note that this source of diversity, the noise category, actually only 
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Figure 6: The ADI of the extracted ‘noise’ from each year in the data set. Pl measurements are 
coded in blue, and P3 in red. The noise ADI indicates a similar increase over the decade as witnessed 
in the total data, if only vaguer. More importantly, the score remains similar to the total ADI, 
indicating that the noises compose a significant part of the overall acoustic diversity. X-axis display 
the year in the data set and the Y-axis is the average Acoustic Diversity value in the data. 


corresponds to 8 % percent of the total content over time on P1 and 6 96 on P3. This 
clearly highlights the limitations of the frequency spectrum. Even if it has been pos- 
sible to propose tendencies in the overall distribution of sounds of Swedish radio 
throughout the decade, zooming in on the frequency spectrum only reveals certain 
aspects of the sonic. To comprehend the diversity of recorded sound demands fur- 
ther study of the time-axis. Therefore, the final section of this analysis will pertain 
to the order of content on a higher, time-bound scale of broadcasting. 


7 Scaling the time-axis 


To grasp the rhythmic variations of Swedish broadcasting requires zooming out 
from the details in the noise. Considering how speech and music still constitute the 
majority of the radio content, these two categories can be employed for rhythmic 
analysis. The aim is to consider the predictability of content on the time-axis. In the 
following graph, the similarity in the order in which speech and music occur 
throughout the day is represented along the axis of PCA. This method calculates 
two alternative averages, which serve as the X and Y-axis. Figure 7 maps the corre- 
spondence between separate broadcasting days, where closer clustering implies a 
higher degree of internal similarity within the yearly sample. To study the homoge- 
neity of this rhythm over time, the plots are composed of all 20 broadcasting days 
from each sample year. Figure 7 plots the P1 data from 1980, 1983, 1986 and 1989. 
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Figure 7: Principal components analysis (PCA) of the order of content in P1 data from 1980, 1983, 
1986, and 1989. Each data point corresponds to a sampled day and the principal components have 
been computed from the distribution of speech and music in each 10-minute window as variables. 
X and Y-axes are two variant approximations of the trends in the total data. Darker color-coding 
indicates a later date. Components clustering closer to each other are more similar and those that 
cluster closer to the origin have less deviation from the overall mean. 


The sample sets from P1 display a discernable tendency towards higher degrees of 
relative homogeneity. The transparent nodes, corresponding to earlier dates, con- 
tain more outliers and expand both X- and Y-axis. This broad distribution can be 
contrasted with later dates, indicated by the opaquer nodes, which tend to center 
in a narrow cluster with certain dates almost overlapping. It is possible to interpret 
these results as evidence that the distribution between music and speech becomes 
more ordered over the decade. In this sense, P1 becomes more predictable in re- 
gard to sonic content, and predictability is in turn also a valid measurement of di- 
versity. This is foreboded by Äberg’s contemporaries when they praised “relative 
entropy" (Aberg 1996). The concept of entropy, though disputed, is generally con- 
ceived as a measurement of the total amount of predictable states within a system 
(Letzler 2015). If we consider the distribution of music and speech as a set of states 
along the broadcasting day, the collected sample from 1989 in fact exhibits a higher 
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degree of entropy. It is a result that nuances the previous findings, in which Swed- 
ish radio appeared to become more diverse. It recalls the results in the first part of 
this analysis, where the distribution of acoustic diverse events appeared to in- 
crease, whilst the complexity of time displayed more ambivalent results. The rhyth- 
mic content analysis pertains to a more granular dimension of the audio data, yet 
both results testify to the difference between zooming on the time-axis, and zoom- 
ing in the frequency spectrum. In order to compare the historical development, the 
following figure plots the same aspect in the P3 data. 
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Figure 8: PCA plot of the speech and music distribution throughout the day in P3 broadcasting from 
1980, 1983, 1986, and 1989. Notice how most outliers consist of dates from 1983 and 1986. 


This cluster exhibits significant differences from the P1 sample data. P3 displays a 
higher level of similarity towards the end of the decade, but the early, transpar- 
ent nodes are also comparatively closely aligned. The outliers are instead com- 
posed of dates in the middle of the decade. Besides the tendency toward higher 
homogeneity by the end of the decade, there is thus a curious indication of 
greater variation in the sample set from the middle of the decade. One plausible 
explanation for this irregularity can be sought in archival circumstances. As men- 
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tioned above, between 1983 and 1986, P3 hosted regional content in short seg- 
ments throughout the day. National broadcasting would come to a halt and each 
listener would subsequently receive content from the respective regional sender. 
However, as this content was place-variant, the archive only registers gaps in the 
content. These daily gaps are a likely explanation for the noticeable increase in 
heterogeneity. How to regard it from a historical perspective remains open for 
discussion. One interpretation would be that the P3 content, at least as it was re- 
ceived by each individual listener, became more varied, integrating the sounds of 
an entirely different radio station. Nevertheless, what becomes diversified is not 
the sounds produced by P3. Instead, the data from 1989 seem to suggest that the 
status towards the end of the decade is, just like in P1 broadcasting, less variated. 

This observation has media theoretical resonance. In contrast to concerns 
about the organizational diversity of radio, there is a small, but a long tradition of 
thinking about structural homogeneity in broadcasting. Back in 1977, sound ecolo- 
gist Murray R Schafer speculated on the development of broadcasting media: 


Each radio station has its own style of punctuation and its own methods of gathering the 
material of its programs into larger units, just as the phrases of language are shaped into 
sentences and paragraphs. Different events are repeated periodically in daily or weekly 
schedules, and within each day certain items may be repeated several times at fixed inter- 
vals. (Schafer 1977) 


Writing in the middle of the 1980s, radio scholar Andrew Crisell made a similar 
observation, proposing that internal competition between channels pushed broad- 
casting towards ever more predictable content (Crisell 1986). The rhythmic distribu- 
tion of speech and music on Swedish radio appears to verify these predictions. Yet, 
as the analysis aims to display, sonic phenomena are complex and the historical 
development is not without ambiguity. Instead, the very concept of diversity be- 
comes subject to multiple interpretations. Certain aspects of the sonic content do 
appear to become more diverse throughout the decade, while other aspects relating 
to rhythm exhibit the opposite tendency. 


8 Final notes 


Äberg concludes her critical reflections on radio diversity by stating that the “op- 
erationalization of content diversity has rendered it an irrelevant category for 
the everyday understanding of radio” (Äberg 1996: 7). She perceived contempo- 
rary media research to conflate the complex concept of diversity into either orga- 
nizational structure or checklists of political opinions. Yet, as Áberg argued, there 
is no direct causal relationship between different aspects of diversity, and by fo- 
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cusing on too narrow and operational definitions, media research risks missing 
out on the more nuanced scales of radio. The results from my analysis underscore 
this point. It is only by remaining dynamic to the subject of study whereby the 
manifold aspects of sonic diversity are revealed. Entropy, as Äberg noted, needs 
to be relative to something. The analysis has revealed oppositional lines of devel- 
opment on separate scale levels, both within respective channels, as well as in the 
relationship between the two channels. Starting from a zoomed-out perspective, 
treating the entire audio data as an unsegmented mass, it was possible to detect 
contradictory tendencies in the sonic content. Within the frequency spectrum, 
both channels indicated an increasing breadth of content. It was furthermore pos- 
sible to determine that the highest cause of acoustic diversity stems from neither 
spoken, nor musical content, but the few occasions of other sounds or noises. 
However, bearing the time-axis in mind, it is possible to detect an increasingly 
repetitive distribution of content throughout the broadcasting day on both chan- 
nels. In this regard, the two channels also grew increasingly differentiated from 
each other. Thus, considered as a system of multiple channels, differences are 
gradually more distinguished, whilst each channel develops a more predictable 
rhythm of content. The two channels appear to become more ambiguated from 
each other, rendering a more heterogenic broadcasting selection. This happens 
simultaneously as the internal homogeneity of each respective channel increases. 
The results thus contribute to our understanding of how broadcasting content de- 
velops over time within a closed environment. 

I will use the final paragraphs of this essay to reflect on the larger significance 
of these results. Beyond the historical significance, these results are also a testa- 
ment to the character of recorded sound. The way in which the subject of sonic 
diversity has shifted through different perspectives is not unique to Swedish Radio. 
Rather, it reflects how scaling is a necessary instance when working with audio 
data. As the analysis demonstrated, the sonic material was only possible to explore 
by engaging several different scale steps subsequently. Yet, the order and choice 
of scale allow for different nuances in the data. As the analysis demonstrated, 
whether we choose to zoom within the frequency register, or along the time-axis, 
different results are rendered. Visual representation, automatic analysis, and care- 
ful listening need to be applied consequently. The manner in which we arrange the 
possible stages of mediatization affects the object of study in ways that are not arbi- 
trary, but require precision. At this point, a new theory of scale in sound studies 
must neither resort to naive positivism nor become ensnared in the debates of rela- 
tivism. Though it is possible to learn from previous debates around textual data, 
recorded sound calls for new perspectives on scalable research. 

My suggestion is that a future theory of scale in the study of audio data might 
find inspiration, neither in tabular data nor in text analysis, but within a neigh- 
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boring field of research concerning the same modality. Within the realm of soni- 
cally-oriented knowledge, music theory already offers a highly developed concept 
of scale which diverges from other applications in the sciences. Music, and espe- 
cially harmony, has been thoroughly scrutinized for its historical ties to positivis- 
tic conceptions of data (James 2019). Nevertheless, as music theorists Daniel Chua 
and Alexander Rehding recently reminded us, music and its associated theoriza- 
tions allow for much more nuanced and complex thinking (Chua and Rehding 
2021). The musical scale does not designate a singular corrective but provides 
multiple solutions for the movement between orders of magnitude. Certain solu- 
tions unveil the capacity of the material in a more striking manner than others, 
yet the decision always takes place at the intersection between aesthetics and 
truth, and the choices are up to the researcher. As is the case of this analysis, the 
more abstract order of magnitude might render similar results as what is detect- 
able on a small scale. Yet, frequency and temporality can be segmented in many 
ways in between, adding nuances and dynamism to the results. Thus, sonically 
invested scholarship must further explore and experiment with the possible com- 
bination of scales, and scale steps available within the scope of audio analysis. 
Therefore, this article is a call for scale exercises. ‘Zoom’ is no longer a mere on- 
omatopoetic description for a visual operation. Sounds are always already being 
zoomed. 
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® Complexity and Analytical-creative 
Approaches at Scale: Iconicity, 
Monstrosity, and #GraphPoem 


Abstract: By revisiting concepts such as monstrosity and iconicity, this chapter 
reconsiders digital writing from a complex system perspective involving multi- 
scale architecture, performativity and intermedia. The case in point is the Graph 
Poem project—applying graph theory and natural language processing in poetry— 
particularly with its analytical-creative approaches informing certain computation- 
ally assembled anthologies coming out of the project. The algorithms deployed in 
those anthologies operate at different scales, expanding networked text corpora by 
adding poem-nodes based on both small-scale, poetry diction related, and large- 
scale, network-topology-relevant criteria. Such methodology illustrates a politically 
updated data-focused relevance of monstrosity beyond the recent post-disciplinary 
theory, as well as a concept of iconicity drawing mainly on writing theory and per- 
formance studies. A complexity-informed notion of digital writing thus emerges 
foregrounding scale, monstrosity and iconicity as conjoined features of digital 
space and the digital in general. 


Keywords: digital writing, complex dynamic systems, monstrosity, iconicity, net- 
work analysis 


1 Introduction: Digital writing from a complex 
systemic and analytical-creative perspective 


This chapter contends that monstrosity and iconicity are fundamental features of 
computational approaches to text corpora and networks in digital space and that 
the interconnectedness of the elements involved in digital writing and their be- 
haviour make this type of writing strikingly similar to complex system. Complex 
systems have been successfully used to model and explain the heterogeneity and 
interrelatedness of real-world phenomena. Santa Fe complexity experts John 
H. Miller and Scott E. Page explain that “[a]t the most basic level, the field [. . .] 
challenges the notion that by perfectly understanding the behavior of each com- 
ponent part of a system we will then understand the system as a whole" (2007: 3). 


8 Open Access. © 2024 the author(s), published by De Gruyter. LGS This work is licensed under the 
Creative Commons Attribution-NonCommercial 4.0 International License. 
https://doi.org/10.1515/9783111317779-010 
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They also emphasize the fact that complexity is interested in a state between sta- 
sis and chaos, control and anarchy, particularity and universality. 

A complex systemic approach to digital texts is thus two-pronged: on the one 
hand, it posits that any change in one of the elements of a corpus (a text, in our 
case) has both direct and indirect effects on all the other interconnected elements, 
as well as on the place and significance of a text within the larger collection; second, 
it acknowledges the emergent nature of digital writing, whose evolution cannot be 
foreseen by simply referring to the sum of properties of the texts that is made of; 
rather, computational emergence makes digital writing a radically novel macro- 
level entity with respect to a micro-level substrate, a structure in which low-level 
elements are in tension with the higher level ones. Nevertheless, the organized com- 
plexity of computational approaches and the inevitable reductionism in the model- 
ling of such writing occasion a reflection on the notion of control which, we argue, 
could be used to enhance the emergence of digital textuality and its openness to oth- 
erness, to foreign’ traditions and modes of writing. The emergent digital textualities 
rely on traceable and quantifiable properties of existing corpora, but they turn the 
analysis of such a controlled medium into a creative catalyst for further multimodal 
expansions. 

In the second section “Texts as (photo)graph and performative relays”, we 
draw on performance theory to propound a more nuanced notion of (poetic) 
texts as computationally inter-related “performative relays” (Wildrich 2014a and 
2014b)—documents acting as continued transfers of the performances they docu- 
ment, so much more amenable in our context given the inherent performative 
nature of poetry—and build on this to foreground a generalized and inclusive 
concept of writing ranging from natural-language-based to coding to interface 
and platform programming. The notion of performative relay also conveys well, 
we believe, the emergent nature of digital writing—understood as closely deter- 
mined by the occurrence of *complex large-scale behaviours from the aggregate 
interactions of less complex parts" (Holland 1995: 11)—and the ways in which its 
evolution is informed by the history, medium and mediation of the texts involved. 
As such, the expanded concept of digital writing advanced by this contribution 
refers to complex systemic environments activated by intermedial and performa- 
tive—and, therefore, iconic—presence as *presentification" potentially turning 
out creative (in several data-relevant senses, and thus monstrous) in drawing on 
contemporary culture's pervasive analytical modes of mediation. 

In order to elucidate that concept, in the third section “Iconic digital writing 
for/as #GraphPoem complex systemic & analytical-creative approaches", we criti- 
cally engage with Sybille Krámer's notion of iconicity (2003) and with John Cayley's 
metaphor of the icon (2006) and revisit the notions from a complexity thinking 
point of view, treating digital writing as an open communication system engaged in 
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continued feedback loops with the medium and the computational model. The 
interoperability informing the digital, we argue, is what opens texts to new modes 
of writing, giving them an organicist dimension: writing in digital networked envi- 
ronments is alive and everchanging. It is embedded in the medium, intermedial 
and part of the mediation, that is, it is performative and spatial, while also making 
the beyond and the inherent present through continued interaction. 

Such distinctions pave the way for emphasizing in the fourth section “Text 
and/as network; control and rock ‘n’ roll: systemic monsters’ ubiquity and mon- 
strous resistance" the dynamic nature of the relationship between the elements of 
digital writing and its multi-scale architecture, specifically in a complex mathemat- 
ics and analytical-creative framework. We argue in favour of natural language 
processing (NLP) combined with network analysis as a most effective way to com- 
putationally chart text corpora in general and thus better understand digital writ- 
ing. The appropriate mathematical model for such corpora and investigations is 
the (multilayer) network, whose complexity and dimensionality raise challenging 
issues related to processing, operability and comprehensibility. We posit that the 
most suitable theoretical framework is represented by complexity and graph the- 
ory applications with an emphasis on representative concepts like monstrosity and 
iconicity that involve aspects of both all-engulfing magnitude and detailed specific- 
ity. These concepts are particularly useful as they straddle the apparent conflict 
and actual symmetry between the ubiquitous control informing the digital and dig- 
ital writing as potentially working—mainly by means of its complexity and analyti- 
cal creativity—against such control. We consequently examine digital writing by 
means of complex network (graph) theory and we use the Graph Poem project and 
its developments as a case in point. The Graph Poem involves poetry corpus assem- 
blage and expansion—a specific form of data-related complex-mathematics-driven 
creativity—as well as computational-analysis-informed creative writing/generation 
and intermedia (e.g., #GraphPoem @ DHSI in Section 5). 

Following up on the digital-writing-relevant tie-in between complex mathe- 
matics and analytical-creative (humanities) computing, on the one hand, and 
monstrosity and iconicity, on the other, in the last section *Demo(nster)s of net- 
worked textualities and computational (analysis-based) writing", the discussion 
will review these two latter concepts considering the recent literature. Monstros- 
ity will be analysed from a literary studies, digital humanities, and remix studies 
perspective, and iconicity will be situated in a writing, performance and remix- 
studies-relevant framework. While monstrosity is generally seen as a Franken- 
stein of the humanities embodying DH and, thus, referring mainly to issues of dis- 
ciplinary tresspassing' and embracing the humanistic ‘Gothic’ dark side, but also 
to (paradoxical) political/cultural complacency or un-criticalness, we will argue 
that it is a feature of both innovative, transgressive or complex approaches as 
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well as of the insidiousness of ubiquitous systemic control in digital space and 
cultures informed by magnitude and scalability. Iconicity will then be revisited in 
relation to digital writing’s intermediality, the significant role played by the visual 
and visualization in digital cultures, and complex surfaces in electronic literature 
(and beyond), while arguing that it is inextricably intertwined with monstrosity 
in simultaneously large-scale and detail-intensive contexts. 

The conclusion will reinforce the literary, philosophical and historical relevance 
of the highlighted project from a complexity angle grounded in NLP and graph the- 
ory and employing the cultural lenses of monstrosity and iconicity in a way that 
advances specialized research on control at the graphic, linguistic and algorithmic 
level in digital writing. We conclude that the multi-scale architecture of specific pro- 
cedures within projects such as the Graph Poem sets the ground for an analytical- 
creative approach to textuality, one that uses the medium-informed apparent fixity 
of existing corpora to create further scalable instantiations of digital writing. 


2 Texts as (photo)graph and performative relays 


Text remains a volatile concept and reality at the crossroads of an impressive 
number of subjects, approaches and media. This volatility—perhaps more appar- 
ent than ever in the age of connectivity—is still interestingly contemporaneous 
with very diverse and even conflicting acceptances of text, and one should per- 
haps not be surprised if terms like “block” cohabit with more dynamic and inter- 
active ones sometimes within the same approach. Interestingly enough, from a 
dynamic systems point of view, “blocks” are five behaviour patterns that confer 
movement to the structure, either through linear and exponential growth, or 
through decay, collapse or oscillation (Deaton and Winebrake 1999). They are 
never static. If we examine digital writing through the lens of complex systems 
though, texts will be conceived as large dynamic multi-mode, multi-link networks 
with various degrees of uncertainty, which arise from the inflows and outflows of 
information that animate the system. 

In literary history, for instance, Copeland and Ferguson (2014) speak of a 
number of *building blocks" of literary studies [that have occasioned a series of 
conferences and special issues of English Literary History from Johns Hopkins 
University Press], including text, that *may persist as references in dynamic theo- 
retical environments, remaining central to our thinking as literary scholars; but 
as our discipline expands, their conceptual fixity is not assured" (2014: 417). Text 
finds itself in these new dynamic contexts at a confluence of disciplines and ap- 
proaches among which the above-mentioned editors, who otherwise provide a 
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generous enumeration and pertinent brief review of main relevant concepts in 
the humanities (only), do not include digital humanities or any relevant computa- 
tional subject or approach such as text analysis, NLP (natural language process- 
ing), machine learning, etc. That is why, perhaps, the main focus remains—as in 
the quotation above—on the “theoretical environments" and the “conceptual fix- 
ity", whereas a perspective grounded in computationality and digital space would 
rather focus on interactive media-related environments and on the instantiating, 
iterative and operational dynamism of text in digital media. 

It is such environments and dynamism that we want to focus on in our con- 
siderations, while also aiming to shed light on the theoretical aspects of text aris- 
ing from practice-driven and media(tion)-informed contexts. Yet, our approach is 
far from meant to engender any fresh new binaries to replace the old ones; it 
aims instead to go beyond stiff oppositions such as ‘traditional’/analogue vs digi- 
tal/computational and rather see them as coexisting and influencing each other. 
We propose to depict digital writing as a dynamic system, an expansion in which 
stability and instability, analogue and computational coexist at the edge of chaos 
or, a state that allows any system to survive and proliferate. Since (‘traditional’) 
text in digital contexts! is at stake in our research—mostly ‘page poems' in net- 
worked mediating environments for #GraphPoem—it becomes in fact imperative 
to straddle such divides. As we will see in what follows, an effective way to do 
that is to translate the text question into one on writing. Before getting there, we 
need to explore the intermedia and performative qualities of text in digital (net- 
worked) environments. 

Writing never happens in isolation or in one single medium. This is so much 
the more so when we refer to writing in the digital realm. Literature on the effect 
of the digital on born-analogue media has been proliferating over the past years. In 
a book on the literary invention of digital image, for instance, Monjour (2018) ex- 
plores the evolution of photography as shaped by literature and as metamorphos- 
ing into post-photography in the digital age. The ontology of post-photography 
speaks to photography's transitioning from the representational to the performa- 
tive whereby it gets to condition the real instead of simply recording it. In this new 
ontology and ecology, (post)photography is unmoored from the past and plunges 
into virtually possible futures or alternative temporalities. 

We find Monjour's concept of image ecology sensibly consonant to our own 
notion of (digitally) performed commonalities between poems and corpora within 


1 Not necessarily the same thing as *digital text" which can present features such as hyperlinks, 
(embedded) video or audio, interactive images (photo galleries, maps, diagrams, simulations), in- 
teractive questions, etc., cf. Loss Pequeño Glazier, Digital Poetics. The Making of E-Poetry (Tusca- 
loosa: The University of Alabama Press, 2002). 
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the Graph Poem, while speaking conveniently to the intermediality of texts in dig- 
ital contexts and digital writing. We will provide more background on the Graph 
Poem in what follows, but for now suffice it to say that it is a project representing 
poetry corpora as network graphs in which the nodes are the poems and the 
edges correlate those poems based on quantified poetic features. Analysing the 
latter for graph-theory-relevant features will reflect on the poems and corpora 
involved as both individual and interrelational conglomerates. In Monjour’s 
terms, those graphs perform (and “presentify,”)* rather than represent poems 
and corpora. The text of the poem (as graph poem, as network/‘photo’ . . . graph) 
thus becomes a multifaceted, dynamic and interactive reality that shapes and is 
shaped in its turn by the various and multilayered con-texts it is located—and 
performed—in. 

Texts in digital media are indeed performed by the 'simple' fact of being ac- 
cessed, opened, read and downloaded (cf. Kirschenbaum’s and Vauthier's *.txtual 
condition," as well as Lori Emerson's reading-writing interfaces?) and this is so 
much the more the case with poems and/as documents in particular. Poems are 
traditionally perceived as documents of an ‘original’ performance involving the 
staging of certain states of facts expressing, shaping, deconstructing or demystify- 
ing (if not altogether denying) a sense of self as negotiated in lingual inscription. 
In simpler terms, the poem is (‘traditionally’) the recording or documentation of a 
hypothetical (performative) *writing' event. Yet, documentation is already part of 
an event or performance the moment the latter is under way (while there is also 
performativity to documentation, as we will argue next). This is literally the case 
when a literary text is seen as a social occasion (Carruthers 2014) or when a 
poet's (Donne's) ‘manuscripts’ (Kastan 2014) prove to be the result of multiple 
*non-authorial' interventions and social negotiations. 

In more complex frameworks, though, the documentation can be part of the 
performance in ways that enrich both with multiple temporalities and spatial- 
ities. The concept of *performative relays" proposed by Mechtild Wildrich, for in- 
stance (Wildrich 2014a), refers to various documents of a performance that by 


2 *La photo renvoie à ce qui « sera » ou ce qui « pourra étre », s'attribuant ainsi une performativité 
oü le geste de présentation/présentification prend le pas sur la fonction de représentation" (Monjour 
2018: 44). 

3 Matthew Kirschenbaum, “The .txtual Condition: Digital Humanities, Born-Digital Archives, and 
the Future Literary." Digital Humanities Quarterly 7.1 (2013), accessed May 1, 2022, https://bit.ly/ 
1mCbo2G; Bénédicte Vauthier, “The .txtual Condition, .txtual Criticism and .txtual Scholarly Edit- 
ing in Spanish Philology,” International Journal of Digital Humanities 1 (2019): 29-46. 

4 Lori Emerson, Reading Writing Interfaces. From the Digital to the Bookbound (Minneapolis: 
University of Minnesota Press, 2014). 
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means of giving a new place, time, and medium to the latter actually continue it 
and thus, in fact, make it possible, [re]enact it (Wildrich 2014b) on certain ‘new’ 
levels and ‘novel’ contexts. In borrowing and transferring this concept to poetry, 
we bound it to refer to the text of the poem ‘solely,’ thus highlighting and redefin- 
ing text as the document predating, shaping, and, at the same time, recording any 
performance of the poem (cf., for instance, Tanasescu 2022). Text—as document 
of poem—is itself always shaped, dismantled, and again reshaped in its turn—re- 
written and revised, re-inscribed and re-performed—in various networked con- 
texts (Nicolau 2022,° Tanasescu 2022, Tanasescu and Tanasescu 2022). From the 
angle of digital writing seen as intermedia and performative activity, writing a 
graph poem involves (re)writing any poem-node in the graph as the corpus is it- 
self (re)written, and thus also writing in digital space the moving (photo)graph of 
the network that “presentifies” the corpus. 

The paradoxical status of text in the context is that of document predating 
the performance of the poem in digital space. At the same time, since text can be, 
as we argue, redefined in digital (writing) contexts as performative relay(s), it 
emerges after the writing/performance at hand as its recording(s). If, for instance, 
in the example in Section 4, an expanding corpus of poems is (re)present(ifi)ed as 
a network in which the edges are correlations between the vectors representing 
each poem-node, the text of each poem written computationally will evolve along- 
side the corpus per se. The interrelational reticular architecture of the corpus as 
digitally written/performed will be reflected by the evolving digital inscription, 
i.e., document, of each poem involved, its corpus-dependent vector. 


3 Iconic digital writing for/as #GraphPoem 
complex systemic & analytical-creative 
approaches 


Sybille Krámer (2003) offered an articulate systematic reformulation of writing 
truly pertinent to our subject and the example above. Since it does not equal lan- 
guage use but rather provides philosophical—and computational—models for the 
analogue flow of language, she argues, writing is not solely discursive, but *iconic" 
as well. In as much as “cultural activity,” writing is in fact not dependent on either 


5 Felix Nicolau (2022) revisits the #GraphPoem poetics and particularly Various Wanted (MAn- 
GENTO et al. 2021) from a linguistic angle, drawing on Coseriu's integral linguistics and linguistics 
of context. 
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(natural) language or semantics. Moreover, in its “operative” version (the one based 
on calculus and informing higher mathematics and other formal symbolic systems), 
writing is profoundly and effectively desemanticizing and anti-hermeneutic (Krämer 
2003: 531) while nevertheless (or so much the more so) preserving and developing 
specific cognizing (and cultural) functions. Krämer thus manages to bridge a series 
of established divides, such as the one between text and image—since text is now 
proved to have a definitory iconic and “inter-spatial” (523 et infra) component—and 
the one between texts in natural language and other kinds of writing. Among the 
latter, of special relevance to us is obviously the writing done in programming lan- 
guages, but logical notations and even “numeric systems” are of course also of inter- 
est (518). 

Conceptual and concrete poets have tried to make this point decades earlier 
though their practice, but their impact has been generally confined to literature 
and aesthetics. The overlapping or even congruity of techniques and strategies in- 
volved in both programming and ‘plain’ writing was advanced by John Cayley 
(2002) in an article that appeared one year before the publication of Krämer’s con- 
tribution in English translation. While indeed not discussing concrete writing, 
Krämer’s argument goes nevertheless beyond the scope of poetry or the arts and 
makes a strong point about the iconic and (inter-)spatial nature of writing in gen- 
eral. Likewise, her elucidations related to writing as operative and even algorithmic 
technique (Krämer 2003: 534) are equally general and far-reaching, particularly 
given that digital writing is approached only towards the end of the paper in a suc- 
cinct fashion that still manages to cover some crucial aspects discussed below. Com- 
pared to that, Cayley's considerations regarding programming as already present 
in traditional/page writing, although universally applicable, are not employed to 
foreground a general concept of writing but on the contrary, to distinguish between 
the code involved in, and the resulting text of, electronic literature computational 
generation. Yet, in describing how 3D immersive virtual electronic literature 
works, Cayley uses icons as a metaphor for the unique experience involved, draw- 
ing on recent theological contributions as well as canonical Eastern Orthodox au- 
thors like John of Damascus. Just as for a believer or a religious painter the icon is 
not a representation, but a threshold beyond (and by means of) which experiencing 
the deity is literally possible, dynamic inscriptions on *complex surfaces" can *mys- 
teriously" congeal into meaning for the reader/user (Cayley 2018: 10 and 222). 

We want to combine particularly these two perspectives, Krämer’s and Cay- 
ley's, into the notion of the iconic relevant to our argument. Krämer’s “operative 
writing" is best epitomized by calculus and higher mathematics, but being quin- 
tessentially a *cultural technique"—i.e., *dealing with symbolic worlds" while in- 
volving *desemantification" as a *crucial aspect" (Krámer 2003: 531)—it pervades 
so many other kinds of writing, potentially all of them and, therefore, literary 
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writing as well. It also bridges literary writing quite conveniently with coding 
and the kind of (inter)operability informing the digital, while also covering with 
its iconicity the intermediality typical of the latter. As a medium embedded in an- 
other medium (the digital one), argues Krämer, drawing on Niklas Luhmann's 
general systems theory (Krämer 2003: 529-530),° writing becomes the only form 
of that new medium. 

It is in this capacity, therefore, of sole form of the digital medium, that writ- 
ing can occasion interconnections across apparently widely disparate genres, 
such as computer code and ‘page’ poetry, and thus allow us to explore the anat- 
omy and functionality of the relevant complex networks. Nothing is pure, simple 
or un-miscegenated about either the connectivity of such networks: the links be- 
tween poems and the topology of their graph-structured corpus—and, indeed, the 
versions of the poems themselves as read and processed by the machine—are in- 
evitably informed by the “operative writing” involved in the algorithms and the 
specific coding scripts deployed for the task or within the various interfaces and 
platforms in use. The reverse is also true. The interactions and links between the 
coding blocks, applications, and interfaces involved, the way they perform in the 
shared context and their specific outputs are coloured by the sets and models im- 
ported and by the specific contents and features of the poems and corpora under 
scrutiny. 

As we intend to illustrate the relevance of the iconicity of digital writing to 
the complex-system-based approach informing the Graph Poem (and, thus, po- 
tentially contemporary DH more generally), we will provide here a brief back- 
ground and description of the scope of the project. While various other aspects 
and applications of the Graph Poem will still continue to inevitably emerge in 


6 A note needs to be made on how we depart from Luhmann in our understanding of digital 
writing as a complex system. For the German sociologist, the definition of a system is based on 
the concept of autopoiesis (1995, 408), or self-creation, which radicalizes otherness: his notion of 
“soft” complexity is articulated around the concept of “complexity of operations,” that is, the 
number of possible relations between the constituents of a certain system exceeds the number of 
actual relationships that will happen in the said system. Luhmann’s view of complexity is thus 
largely reductionist, because the complexity of operations entails selection and because systems 
are separated from the environment by a boundary: he only acknowledges the differences that 
arise between systems (what makes one system different from another) and the difference be- 
tween a system and the environment, but he does not fully address the problem of the differen- 
ces arising between units in the same system—that is, the problem of heterogeneity. Luhmann's 
difference is contained within the system and is a condition of the system's self-referentiality 
and closure, whereas the present contribution sees digital environments in general and digital 
writing in particular as essentially open to external interactions and ultimately internalizing 
such interactions. 
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our argument as we go along, here we will refer to its main tenets and milestones. 
Initially started at San Diego State University in 2010, continued on a Social Scien- 
ces and Humanities Research Council grant at the University of Ottawa starting in 
2014 and then also at UCLouvain since 2019, Graph Poem involves deploying NLP 
and graph theory applications in computational poetry analysis. The project's ob- 
jectives are to digitally define, compute, represent, expand, and evaluate graphs— 
networks of nodes and edges, where the nodes are the poems and the edges, links 
between the poems defined according to genre-based criteria—as computational 
tools to (re)organize North American (and then progressively other) poetries and 
to discover new relevant commonalities and paradigms among various poems 
(and, therefore, poetries and poets). More specifically, the edges quantify similari- 
ties between these poems in genre-relevant respects, such as diction, meter, sonic 
devices, figures of speech, etc. Since advanced by a poet, the project had from the 
very beginning a strong creative component: the graphs were initially used in 
writing poems presenting certain stylistic features and/or (thus) fitting into or en- 
riching specific corpora and collections, as well as in teaching poetry and creative 
writing (Tanasescu 2011). 

As the project advanced, this trait developed into a computational analytical- 
creative approach. As detailed below, the computationally assembled anthology 
“US” Poets Foreign Poets (MARGENTO 2018) foregrounded such an approach as rele- 
vant to locating and integrating relevant data. An initial corpus of contemporary 
(North) American poems was algorithmically expanded to include poems (from 
any other literature, region and period) that proved to have particular NLP-based 
features resulting in their positioning and ranking within desired ranges in the 
overall evolving network (Section 4 for further specific details). The complex cou- 
pling of NLP and network analysis resulted in the creation of a one-of-a-kind 
evolving dataset making up the ongoing project of the collection, of which the 
published anthology was only a momentary stage. 

The latter represents an instance of iconic digital writing as it intermedially 
inscribes a developing corpus and its mathematical modelling alongside the texts 
it consists of, or it keeps absorbing, in their evolutionary interrelation. In this con- 
text, analytical creativity refers to the creation of data not from scratch, but 
based on the computational analysis of existing datasets, and as (dis)located data 
now ‘made new’ by processually being integrated into, interacting with, and re- 
shaping, those initial datasets. While the process also involves the creation/gener- 
ation of new texts and/or algorithms (cf. Section 4), the main emphasis is laid on 
corpus complex-mathematics-based (dataset) assemblage and automated expan- 
sion, as a form of digital writing. 
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In a more recent iteration of the project (MARGENTO et al. 2022), the creative 
part of the analytical-creative paradigm’ is somewhat closer to a more habitual 
acceptance. Initial text datasets (poetry collections) were hybridized with others 
(poetry collections or anthology contributions as well, but also academic articles 
and social media content) and then fed to topic modelling algorithms that pro- 
vided alternative readings/rewritings of those initial data (consequently also hy- 
bridized). The resulting ‘translations’, from English into English, Romanian, and/ 
through Python, progressively featured words, lines of verse and finally stanzas 
branching out into “various” multiple inter- and intra-language renditions and 
(concrete-poetry-style) rearrangements. The iconicity of such writing bridged text 
and/as visuals as well as natural language and code, alongside traditional (print 
and electronic) book formats and platform content (specifically GitHub repos).° 

From the above-outlined perspectives, digital writing is performative and the 
project under discussion has a literal performance component too. With respect to 
the latter, Cayley's in-passing appropriation of the mystical iconic can prove quite 
useful in that it accounts for the complexity, diversity and manifoldness of inscrip- 
tions in digital space while alluding to a relevant kind of mediating transcendence. 
As argued in the previous section, digital writing is fundamentally performative 
and when an initiative like #GraphPoem involves performance per se, the interface 
becomes an “interfacing livestream” as the spectacle consists of performing an ap- 
parently usual Graphical User Interface as event venue (Tanasescu 2022). The 
venue becomes on these occasions a screen of manipulated and manipulatable 
screens, an interface of interfaces, multitudinous and connective. As such, the per- 
formed interface goes beyond the established distinction between “looking at” and 
“looking through" windows as a digital-space-based theatrical hypermedium that 
performs the interface and presents it as the performance, while also opening it up 
to various other platforms and participants or users impacting it from beyond. This 
interplay of the within and the beyond and the mediating transcendence of the re- 
lated binary speaks quite relevantly to the iconic described above. 

The type of modelling involved by projects such as the Graph Poem involves 
network representation and analysis; therefore, the outputs provided and insights 
inspired by network analysis reflect on an actual immanence of the data at hand. 


7 While, to the best of our knowledge, “analytical-creative” is our own term in the field of liter- 
ary computing (and DH more generally), other projects and stances can be described as such as 
they also deploy and harness computational analysis for creative (literary) purposes (Drucker 
2021 and Johnston 2018). 

8 For a more detailed description of the process, see the note on the poetics in the collection 
proper (MARGENTO et al. 2022: 114-120); for a translation-studies and digital-humanities-focused 
analysis, see Nicolau 2022. 
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However, the beyond is also in the guise of ongoing propulsion, that is, the above- 
mentioned computational corpus assemblage and expansion. There is, of course, 
humanistic immanence in all modelling in digital humanities (Sperberg-McQueen 
2019: 285) and there is oftentimes complexity and obviously further relevance or 
applicability as well (McCarty 2019). What interests us primarily, though, and 
what is characteristically illustrated by the project under discussion is its iconicity 
involving digital writing in its intermediality and complex reticularity setting in 
motion a dialectics of immanence and transcendence imperiously engaging sys- 
temic control. It is these aspects that tie in iconicity with control and monstrosity 
in ways relevant to our discussion. 


4 Text and/as network; control and rock ‘n’ roll: 
Systemic monsters’ ubiquity and monstrous 
resistance 


Projects like #GraphPoem are representative of a more general condition of digi- 
tal writing. On the one hand, all the data, given its inherent performativity and 
interrelationality, is in continuous transition, in a characteristic state of flow and 
flux. On the other, the ubiquity of the digital and the imbrication—if not complete 
overlapping of space and digital space (Vitali-Rosati and Monjour 2017)—makes 
control arguably ubiquitous and inescapable. *That a monster should be such a 
natural!” as Trinculo puts it.” 

Within network-based digital humanities projects as well as on the Web in 
general, nodes and layers in flux mutually instantiate and perform each other. 
Such dynamic of both inter-related and dissonant ontologies demand matching 
adaptive polyvalent models accompanied by a versatile uncompromisingly trans- 
gressive critique: a critique that needs to track down and scrutinize the numer- 
ous, if not countless points of entry for political and economic control. To the 
extent to which no data is raw or unbiased (Gitelman 2013) and no algorithm, cod- 
ing block or computational application can be developed or run in any haven or 
heaven of immunity to system domination or corporate infiltration, control is vir- 
tually ubiquitous, located at *the level of the medium itself" and *must be defined 


via the actual technologies of control that are contained within networks".'? 


9 William Shakespeare, The Tempest (New York: Simon & Schuster, 2013), 3.2. E-book. 
10 Alexander R. Galloway and Eugene Thacker. “The Limits of Networking." Nettime, March 24, 
2004, accessed April 9, 2022. https://bit.ly/2RYGhAj. 
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The simple act of (digital) writing in itself, by itself, and in its entirety attracts 
and, moreover, ensures control; writing (in) the network is not only network- 
controlled, but also constitutes the very act of network control. The latter has ac- 
tually been synonymous to communication (and therefore network writing) from 
the very inception of all operational packet-switching webs: what TCP (the trans- 
mission control protocol) has been for the internet, was, before the advent of the 
latter, NCP (the network control protocol) (Baldwin 2015). Just like control, net- 
works are also always there, even when apparently not (Franklin 2012). The 
monstrosity of control thus resides in its habitation in the very factuality of digital 
writing proper and what unites the two is their shared technologically networked 
nature. 

The paradox of any disruptive, subversive or even liberating digital writing 
endeavour—or, in our inclusive iconic acceptance, digital humanities initiative— 
thus emerges to be attempting to fight control by means of or from within control, 
using emerging network(s) against the network, and leveraging in-house complex- 
ities to escape systemic complexity. Heidegger's The Question Concerning Technol- 
ogy (1977) states a seemingly related paradox: it is particularly in the dangerous 
orientation humanity has towards technology, an orientation that “[en]frames” 
humans themselves as standing-reserve (Heidegger 1977), that humanity could 
find the potential to be rescued. The Heideggerian “rescue” by means of technol- 
ogy—when the latter means digital technology—will yet have, as it follows logi- 
cally, to work with control against control. Even if, for instance, linked open data 
(LOD) represents a utopian attempt at democratizing the digital, those articulat- 
ing them into searchable databases will still have to navigate the challenges of 
power and gatekeeping informing authority records (as established by major li- 
braries and archives) (Mattock and Thapa 2021). 

In the case of the Graph Poem, this kind of navigation posits complexity the- 
ory as the scaffolding for developing data-driven computational ‘monsters’ ana- 
lytically-creatively attempting to elude monstrous control. Monsters are hybrids 
par excellence (see the next section) and our practice involves hybrids indeed that 
behave conveniently monstrously due to the complexity informing them. 

The computationally assembled poetry anthology *US" Poets Foreign Poets 
(MARGENTO 2018), for instance, involved monstrous hybridisation on a number of 
levels. Developed as a transnational and translational project, the anthology in- 
volved or featured people in Canada, Romania, the U.S., Mexico, *Babylon," and 


11 Franklin intriguingly concludes that even the cloud is a network, even if in its hidden inex- 
plicit layers. 
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beyond, and deployed notions and methods of algorithmic translation’? that 
shaped the data in unprecedented ways. For the translation of digital poetry 
pieces, we translated the involved algorithms, that is, developed versions of them 
or put in place completely new ones generating intra- and inter-lingual equiva- 
lents (in English and Romanian, the main two languages in the anthology, but 
also in French, Spanish, and .. . Python). Going back and forth between such 
algorithms as well as between natural languages—mainly English and Roma- 
nian—in the process obviously put a subversive spin on the data and the ways 
in which they were assembled. 

Moreover, and speaking of computational assemblage, we and our co-editors 
deployed several Python in-built algorithms for computing certain features of the 
corpus of poems represented as a graph. Yet, for the algorithmic expansion of the 
corpus, we used our own diction classifier, described alongside our rhyme classi- 
fier in Kesarwani et al. 2021, to track down potential candidates for inclusion in 
the collection that would preserve and further develop the above-mentioned fea- 
tures. This took the initial dataset down a path of centrifugal—and potentially 
self-dissolving—distortion and estrangement (plus literal foreignization, as the 
initial US slowly slipped into “us,” poets both within and beyond the imperial 
walls of the USA). The in-house algorithms instrumental in this corpus expansion 
computed and multiplied consistencies that translated into what the Python mod- 
ules could only read as paradoxical and singular. In Figure 1, we have the graph 
of the initial corpus (of 40 poems) and in Figure 2, the same, only a few stages 
later, already expanded with 13 poems. 

Here is what was noticed to be special about certain nodes in the initial cor- 
pus: poems like Harryette Mullen's *[marry at a hotel, annul 'em]", Jerome Roth- 
enberg's “The Holy Words of Tristan Tzara", Alan Sondheim's “zz” and Ilya 
Kaminsky's *We Lived Happily During the War" (nodes 21, 25, 31, and 12, respec- 
tively, in GraphO [Figure 1]) proved to rank really low in terms of closeness cen- 
trality and incredibly high for betweenness centrality. Vertex 21 (Mullen), the last 
on the closeness centrality list, ended up second on the betweenness centrality 
one, while 25 (Rothenberg), on the 26th position in closeness, turned out to be 
none other but the very first in betweenness (see the lists below). Similarly, 31 
(Sondheim) ranked 30th in closeness yet 3rd in betweenness, while 12 (Kaminsky) 
got the 35th position in closeness and still went as far up as the 5th one in be- 
tweenness (MARGENTO 2018: 260). 


12 “Transcreation” in Funkhouser's terms (MARGENTO et al. 2019). 
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Figure 1: The graph of the initial corpus of poems for the anthology “US” Poets Foreign Poets 
(Margento 2018).? 


13 Initial Corpus: 

0: ‘baker-scavanger-loop.txt’, 1: 'baldwin internet unconscious.txt', 2: ‘bond-in.txt’, 3: ‘carpenter_issuel. 
txt’, 4: “drucker unnatural selection.txt', 5: ‘galvin_headlines.txt’, 6: ‘galvin_in_my_sights.txt’, 7: ‘heji- 
nian-i-laugh-as-if-my-pots-were-clean.txt', 8: ‘hejinian-preliminaries-life-after-1990.txt’, 9: 5have one.txt', 
10: joudah pulse.txt', 11: joudah way. back.txt', 12: 'kaminsky during war.txt', 13: 'kaminsky map of - 
bone.txt', 14: Tevin five skull diadem.txt', 15: levine second going.txt', 16: Tevine smoke.txt', 17: ‘mar- 
gento & erica t carter jam session.txt', 18: ‘mcclatchy_mercury.txt’, 19: *mencia poem crossed atlantic. 
txt’, 20: ‘mullen-if-in-virginia.txt’, 21: ‘mullen-marry-at-hotel.txt’, 22: ‘mullen-massa-had-a-yeller.txt’, 23: 
‘notley_at_night_states.txt’, 24: rothenberg pound project.txt', 25: ‘rothenberg_tzara.txt’, 26: ‘sappho.txt’, 
27: 'scappettone underture.txt', 28: 'snyder home from siera.txt', 29: 'snyder what you should know. 
txt’, 30: 'sondheim-please-read.txt', 31: 'sondheim zz.txt', 32: ‘stacy-doris-poem.txt’, 33: 'starzinger- 
collectio.txt', 34: ‘stefans_walkabout.txt’, 35: taylor apocalypse tapestries.txt', 36: ‘vincenz_bicycle.txt’, 
37: ‘waldrep_fragment1 txt’, 38: *waldrep fragment3.txt', 39: ‘wright_apologia_pro_vita.txt’}. 


Closeness centrality for G0: 

OrderedDict([(1, 1.0), (3, 1.0), (4, 1.0), (7, 1.0), (9, 1.0), (23, 1.0), (24, 1.0), (0, 0.975), (2, 0.975), (10, 0.975), (19, 
0.975), (33, 0.975), (39, 0.975), (14, 0.951), (29, 0.951), (35, 0.951), (36, 0.951), (6, 0.928), (8, 0.928), (11, 0.928), (13, 
0.928), (16, 0.928), (34, 0.928), (38, 0.928), (5, 0.906), (25, 0.906), (30, 0.906), (37, 0.906), (27, 0.886), (31, 0.886), 
(17, 0.866), (26, 0.829), (18, 0.812), (15, 0.795), (12, 0.78), (20, 0.78), (28, 0.78), (22, 0.75), (32, 0.709), (21, 0.582)]). 


Betweenness centrality for G0: 

OrderedDict([(25, 0.681), (21, 0.145), (31, 0.109), (1, 0.107), (12, 0.068), (5, 0.064), (20, 0.049), (19, 0.048), 
(30, 0.048), (38, 0.037), (32, 0.032), (24, 0.028), (18, 0.022), (39, 0.018), (22, 0.016), (17, 0.012), (28, 0.009), 
(33, 0.006), (14, 0.005), (35, 0.005), (6, 0.004), (27, 0.001), (34, 0.001), all the other nodes - 0]). 
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Figure 2: The graph of the “US” Poets Foreign Poets (Margento 2018) corpus after algorithmic expansion." 


14 Expanded corpus: 

(0: 'babylonians high priest prayer.txt', 1: 'babylonians womanumission.txt', 2: ‘baker-scavanger 
-loop.txt’, 3: *baldwin internet unconscious.txt', 4: ‘bond-in.txt’, 5: carpenter issuel.txt', 6: ‘cay- 
ley & howe readers.txt', 7: ‘dove_tafelmusik.txt’, 8: ‘drucker_unnatural_selection.txt’, 9: 'foarta - 
butterflycion.txt’, 10: 'funkhouser from hello allocation.txt', 11: 'funkhouser frontal renewal.txt', 
12: 'funkhouser margento pyprose.txt', 13: ‘funkhouser_margento_runes.txt’, 14: 'galvin head- 
lines.txt’, 15: *galvin in my sights.txt', 16: ‘hejinian-i-laugh-as-if-my-pots-were-clean.txt’, 17: 
‘hejinian-preliminaries-life-after-1990.txt’, 18: "huerta ayotzinapa.txt', 19: jhave one.txt', 20: jou- 
dah pulse.txt', 21: joudah way back.txt', 22: 'kaminsky during war.txt', 23: 'kaminsky map of - 
bone.txt’, 24: levin five skull diadem.txt', 25: levine second going.txt', 26: levine smoke.txt', 27: 
*malaparte entire history of humankind margento.txt', 28: *margento & erica t carter jam ses- 
sion.txt’, 29: *mcclatchy mercury.txt', 30: *nencia poem crossed atlantic.txt', 31: *montfort code. 
txt’, 32: ‘mullen-if-in-virginia.txt’, 33: ‘mullen-marry-at-hotel.txt’, 34: *mullen-massa-had-a-yeller. 
txt’, 35: ‘notley_at_night_states.txt’, 36: ‘rothenberg_pound_project.txt’, 37: ‘rothenberg_tzara.txt’, 
38: ‘sappho.txt’, 39: ‘scappettone_underture.txt’, 40: ‘snyder_home_from_siera.txt’, 41: ‘snyder_- 
what_you_should_know.txt’, 42: ‘sondheim-please-read.txt’, 43: ‘sondheim_zz.txt’, 44: ‘stacy-doris- 
poem.txt’, 45: ‘starzinger-collectio.txt’, 46: ‘stefans_walkabout.txt’, 47: ‘strickland_house_of_trust_- 
w_hatcher.txt’, 48: ‘taylor_apocalypse_tapestries.txt’, 49: ‘vincenz_bicycle.txt’, 50: ‘waldrep_frag- 
mentl.txt’, 51: ‘waldrep_fragment3.txt’, 52: ‘wright_apologia_pro_vita.txt’}. 


Closeness centrality for G6: 

OrderedDict([(3, 1.0), (5, 1.0), (8, 1.0), (16, 1.0), (19, 1.0), (35, 1.0), (36, 1.0), (4, 0.981), (30, 0.981), (52, 
0.981), (2, 0.962), (6, 0.962), (49, 0.962), (17, 0.945), (18, 0.945), (20, 0.945), (24, 0.945), (26, 0.945), (27, 
0.945), (31, 0.945), (45, 0.945), (47, 0.945), (48, 0.945), (46, 0.928), (21, 0.912), (51, 0.912), (15, 0.896), (23, 
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Figure 2 also shows how a comparable unusual status can be seen in some of the 
poems our in-house algorithms have meanwhile thrown in: node 9, for instance, 
“Butterflycion” by Serban Foarta,” is the fourth lowest in closeness centrality and 
third highest in betweenness centrality, while node 47, “House of Trust” by Stepha- 
nie Strickland and Ian Hatcher, is the 22nd for closeness and second for between- 
ness. With more and more poems more and more marginal—in terms of distance 
to most of the other poem-nodes—but central in terms of helping others connect 
(betweenness centrality), the corpus will be alternately torn between disintegrative 
expansion (systemic entropy, disorder) and black hole amassing (systemic negen- 
tropy, order), but with an always insatiable avidity for foreignness. 

This latter aspect, which ensures the disparity between the two centrality 
scores a new poem-node gets, is the arguable proof of the subversiveness of such 
a poetry-based computational approach. The continuous turning of the in-built 
modules of the Python library on their head can potentially be read as the butter- 
fly (or... “butterflycion”) effect of the at least apparently common-sensical or 
‘harmless’ insertion of an in-house (or even ‘garage-band’ kind of) procedure be- 
tween the recurrences of some ready-made algorithm or application. If control is 
monstrously ubiquitous in the digital medium, resistance can be monstrously in- 
sidious in its mathematically chaotic effects. 

Most importantly for our argument, these monstrous effects are generated by a 
complexity-informed approach: the behaviour of our two-pronged system is richer, 
harder to predict and therefore more . . . complex’® than the two tiers—ready-made 
network analysis libraries and in-house NLP algorithms—taken separately. The fact 
that the components do not scale linearly onto the resulting aggregate speaks to the 
complexity of the resulting system (the processual performative anthology), but also 
to the multi-scale architecture of such an instance of digital writing. While the cor- 


0.896), (39, 0.896), (41, 0.896), (42, 0.896551724137931), (7, 0.881), (14, 0.881), (50, 0.881), (43, 0.866), (12, 
0.852), (13, 0.852), (37, 0.852), (28, 0.838), (0, 0.825), (38, 0.8), (25, 0.787), (29, 0.787), (22, 0.776), (32, 
0.776), (40, 0.776), (34, 0.753), (10, 0.732), (9, 0.722), (11, 0.712), (1, 0.702), (44, 0.675), (33, 0.565)]). 


Betweenness centrality for G6: 

OrderedDict([(37, 0.546), (47, 0.187), (9, 0.168), (22, 0.098), (3, 0.067), (44, 0.067), (33, 0.062), (10, 
0.060), (45, 0.046), (14, 0.045), (43, 0.036), (32, 0.030), (27, 0.026), (30, 0.024), (51, 0.020), (15, 0.012), 
(34, 0.008), (28, 0.007), (48, 0.005), (36, 0.004), (1, 0.003), (23, 0.003), (42, 0.003), (7, 0.002), (38, 0.001), 
(52, 0.001), (11, 0.0007), (12, 0.0007), (13, 0.0007), (17, 0.0007), (24, 0.0007), (29, 0.0007), (39, 0.0007), 
(46, 0.0007), all the other nodes = 0]). 

15 Serban Foarta, “Buttérflycion.” Translated by MARGENTO. Asymptote July 2016. Accessed 
April 2, 2022, https://www.asymptotejournal.com/jul-2016/. 

16 In more recent iterations of the project, complexity refers literally to the networks involved, 
see MARGENTO. *Google Page Rank Poems" (2022), accessed May 29, 2022, https://github.com/Mar 
gento/GooglePageRankPoems and MARGENTO 2024. 
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pus as a whole expands and the representing network grows in nodes and links, at 
the micro level the nodes evolve in their own idiosyncratic ways: the atypical, 
marginal and eccentric, the *mavericks' draw in more and more like them. If 
one zooms in on the inner workings of such coopting of new nodes and the fea- 
tures used in it, they will find no large-scale-small-scale fractal-like correspon- 
dence. On the contrary, the newcomers will turn out to be primarily related to 
just a minority of already existing nodes (for diction-related quantifications) 
and only in an a-posteriori way to the network as a whole (once their ranking in 
terms of centralities is confirmed). 

Yet, the monstrosity of such a procedure—of analytical-creative data genera- 
tion and expansion—only emerges fully as its inherent complexity comes into 
play. It is hard, if at all possible, to predict the ways in which, generally speaking, 
such complex systems can develop. It is quite likely that in certain scenarios, the 
nodes may evolve in ways that could spectacularly impact the development of 
the entire network. While the centrifugal trend caused by the ever more mar- 
ginal incoming nodes is indeed also imprinted by the requirement for these 
nodes to be good connectors within the overall growing network, it is also con- 
ceivable for certain nodes or node subsets, as centralities are being continuously 
reshuffled, to get completely cut off, particularly since their main affiliations lie 
with a remarkable yet rather isolated minority. Even as a hypothetical possibil- 
ity, this is indication of deep-running discrepancy between various dynamics 
and scales in such complex environments and their disruptive or even disinte- 
grating potential. 


5 Demo(nster)s of networked textualities 
and computational (analysis-based) writing 


Montrer le monstre montré . . . (Salomon 2018: 37-38) It is not just the monster, 
but this sentence and its implicit processual drive that can describe quite accu- 
rately the networked textual performativity and intermediality informing digital 
writing. Monsters have had a spectacular comeback in recent years in digital and 
cultural studies. Megen de Bruin-Molé's “Monster Theory 2.0” (2021) has thrived 
at the intersection of remix studies and mashup, digital humanities and Gothic 
literary and/or pop culture studies. While the initial focus mostly looked at mon- 
sters as widely if not archetypal-elusive cultural constructs seen mainly from 
poststructuralist deconstructive angles (Cohen 1996), this revival speaks mainly to 
digital and networked cultures and social practices. Typically hybrid le mixte for 
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Foucault," the monster is currently further hybridized (Neill 2022) as character of 
popular remixes, mashup and “Frankenfictions” (de Bruin-Mol& 2019). More rele- 
vant for our discussion though are the most recent reassessments of monsters 
and monster theory/ies within the framework of remix studies and/as digital hu- 
manities (Navas et al. 2021, de Bruin-Molé 2021) and that of remix in intermedia 
networked performance (Tanasescu 2022). 

In these two latter contexts, the monster resurfaces as a metaphor that, while 
retaining and recycling definitory traits foregrounded in early monster theory, 
transplants the monstruous from the Gothic character per se to critical transdisci- 
plinary reformulations and practices. If Gothic mashups exposed *dark undercur- 
rents" in the appropriated narratives (Neill 2022, xvi), revisiting the disciplinary and 
cultural status and prospects of digital humanities from a monster-theory-relevant 
stance occasions now both arguments in favor of monstrously decolonizing/ed digi- 
tal humanities as well as a more aware assessment of instances of remix uncritically 
perpetuating political biases and hegemony (de Bruin-Molé 2021). When present 
in performance, monstrosity—referring to community-based and collective data- 
driven subversiveness—can expose not only the undercurrents in a certain narra- 
tive, but those in the medium and space at large (Tanasescu and Tanasescu 2022) 
and reconceptualize remix in the process (Tanasescu 2022). As part of the #Graph- 
Poem @ DHSI performance series, a poetics of network walks is instrumental in en- 
acting a potentially new type of remix applied to a *hypersonnet" and involving the 
relevant code and *human-computer-intra-action" poems, on the one hand, and the 
coding and/or social-media platforms and ongoing sub-performances, on the other 
(Tanasescu 2022). Remix thus becomes a way of rewriting and reperformance within 
performance done by a hybrid sensibly different from the (Gothic) literary monster: 
a data-Al-human communal and ontologically intermedial monster (Tanasescu et al. 
2020, Tanasescu 2022). 

Such binary-straddling three-headed monsters can act as potential *glyphs" 
(Cohen 1996: 4)—i.e., icons or instances of iconicity in our terms—for the (post)digi- 
tal and digital space? and, consequently, also for digital writing. Digital writing 
emerges from this poetics as fundamentally performative and intermedial, while 
text (as networked “glyph,” or again, iconic writing) proves itself as ever-evolving 
document or “performative relay[s]" (see above). This performativity is what ties in 
writing, and thus iconicity with monstrosity most deeply. Demonstrate—and there- 
fore our ubiquitous and still much-in-demand demo—is etymologically related to 


17 Michel Foucault, Les Anormaux, Cours au Collége de France, 1974-1975, Hautes Études (Paris: 
Seuil / Galimard, 1999), 1. 

18 Stephen Kennedy, Chaos Media: A Sonic Economy of the Digital Space (New York: Bloomsbury 
Academic, 2015), 87-88. 
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monster: they can only be grasped when showed, when performed, when proved 
[to be there and working, i.e., performing]. In order to be proven to be effective 
and (highly) performing, it needs to be performed. Performing its performance is 
showing what is shown, montrer le. . . montré in all its monstrosity. 

There is, therefore, a certain recursiveness in digital writing's demonstrosity 
and the ways in which it not only continuously and indomitably traverses and 
generates (layers of) networks, but also—since the *monster always escapes" 
(Cohen 1996: 5)—simultaneously eludes and incorporates them. Franklin, as part 
of his critique of digital humanities practice, especially as understood and carried 
out by authors like Franco Moretti? and his following, cautions that while dia- 
grams and graphs can be useful in processing and analysing texts, they may very 
well oversimplify the diverse nature of writing and textuality; what if, posits the 
critic, the graphs are themselves part of the text's surface and not only its underlying 
anatomy (Franklin 2017: 158). Yet, what Franklin formulated as a counter-argument 
is not always hypothetical and, in fact, it can refer not only to the ‘monstrous’ inclu- 
sion of networks in the text proper, but to their simultaneous demonst(e)rative 
involvement in the composition of the text as well (MARGENTO 2018), a text thus 
explicitly demonstrated as iconic. 

As illustrative of an analytical-creative poetics (Stefans [in Sondheim et al. 
2019], Tanasescu 2023, Tanasescu and Tanasescu 2022), the networks presented in 
the anthology and representing the growing corpus also enact that very growth 
since instrumental in the corpus expansion and thus in the progressive selection of 
poems that are sequentially included in the collection. Being both in and behind— 
or iconically beyond—the collection, and operating at various scales simulta- 
neously, accounts for their monstrous elusiveness and intermedia versatility, as 
well as for the resulting performativity and processuality”” of the anthology per se. 
The latter thus turns out to be a collection of graph (poems) networks, while the 
graphs (or, again, the “glyphs,” in Cohen’s terms cited above, and, in our terms, in- 
stances and agents of iconicity) enact, and actually are, the writing of the collection, 
and an instance of digital writing more generally speaking. As such, the *algorith- 
mic, linguistic, and graphical expansion" that Funkhouser speaks about (MARGENTO 
et al. 2019, web) can work as an enumeration of three different facets of such writ- 
ing but also as a performative (con)fusion of the three to the point of monstrous 
indistinguishability, interoperability and unrestrainability. Most relevantly, these 
three characteristic features speak to the above-mentioned multi-scale #GraphPoem 


19 Franco Moretti, Graphs, Maps, Trees: Abstract Models for Literary History (London; New York: 
Verso Books, 2007). 

20 The *unprecedented propulsiveness" in Christopher Funkhouser's words, MARGENTO et al. 
2019. 


Complexity and Analytical-creative Approaches at Scale — 257 


architecture. While linguistic expansion mainly plays out at small-scale—subset or 
individual (zoom-in) —ode levels in this particular case, the graphic one is prevail- 
ingly manifest at the large-scale—(zoom-out), gradually larger and most often over- 
all comprehensive—level. Monstrosity and iconicity are inevitably intertwined 
across such levels and the processes thereof, but as hinted above, through its segre- 
gational, disruptive and potentially even disintegrative drive, the sectional and nu- 
clear levels of a graph poem are markedly monstrous, whereas, as argued in this 
section, the large or largest-scale ones are ostensively iconic. 


6 Conclusion 


Digital writing can be best understood by deploying complexity theory and an ana- 
lytical-creative paradigm that can befit its intermedial and performative multi- 
modalities, its iconicity and monstrosity. 2GraphPoem is in all these respects a use- 
ful illustrative example of digital writing: the iconic and monstrous poem-network 
escapes and integrates, eludes and includes, evades and invades. As text, it gets sub- 
merged by the various levels of digital writing inscribing it in digital space and 
blowing it up in multiple directions through the commonalities with other texts in- 
stantiated by the networked corpus and the software and hardware dependencies. 
As multilayer performance, it re-emerges through its consistently branching out 
performative relays that (re)generate and analyse, document and demonstrate the 
poem and/as its mediating environments. As seen above, in either of these frame- 
works, the poem makes manifest a multi-scale architecture of digital writing re- 
flecting the complex and paradoxical inner workings of digital space. 

We argued in this paper that monstrosity is actually foregrounded as embedded 
in the (post)digital per se and therefore, just as the systemic control informing the 
latter, ubiquitous. But if control is the omnipresent and inescapable monster, the 
only way to fight it is by monstrosity as well, and this is particularly apparent in proj- 
ects such as the Graph Poem (#GraphPoem), whose analytical-creative and complex 
approach consistently attempts to disrupt and subvert hegemonic politics of data and 
algorithms. The mix of network science applications, NLP (natural language process- 
ing) and intermedia remix in computationally assembling poetry anthologies and 
livestreaming performances involve networked performative textualities outputted 
by a complex multimodal type of writing we termed iconic. Writing in general and 
digital writing in particular are fundamentally if inapparently that way; the Graph 
Poem specifically demonst(e)rates that through the propulsiveness of its graphic, lin- 
guistic and algorithmic human-computer-intra-action monstrosity. As argued above, 
these aspects are instrumental in Graph Poem’s multi-scale architecture particularly 
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as made apparent in one of the project's computationally assembled anthology (Map. 
GENTO 2018) whereby the small-scale levels proved mainly monstrous and the large- 
scale ones discrepantly and predominantly iconic. 

The iconicity of graph poem multimodal writing is complemented by its icon- 
ization of digital space, in its turn based on complexity-theory-informed modelling. 
In the dynamics of this analytically-creative inter-translatable dualism resides and 
performatively thrives the poetic monstrosity of the digital. 
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Benjamin Krautter 

@ The Scales of (Computational) Literary 
Studies: Martin Mueller's Concept 

of Scalable Reading in Theory and Practice 


Abstract: Starting from a detailed reconstruction of Martin Mueller’s theoretical 
conceptualization of scalable reading, my article focuses on the practical implica- 
tions that scalable reading has on (computational) literary studies. Based on an 
extensive analysis where I structure the concept in four different dimensions that 
are important to understand the idea of scaling, I will examine the practical con- 
sequences that scalable reading can have when analyzing literature. For this, I 
will examine different forms of literary network analysis of German plays and 
analyze them in light of the various dimensions of scaling. Doing so, I illustrate 
how qualitative and quantitative methods in literary studies can be brought to- 
gether in a fruitful way. 


Keywords: mixed methods, scalable reading, (computational) literary studies, lit- 
erary theory, network analysis 


1 Outline 


Mixed methods approaches in digital humanities have lately been described as 
a way to “move beyond the dichotomies of ‘close vs. distant’, ‘qualitative vs. 
quantitative’, [and] ‘explanatory vs. exploratory" (Herrmann 2017: § 6), as a 
possibility to “create a space for quantitatively minded digital scholarship that 
goes beyond the trend of ‘big data,’ allowing [. . .] to craft digital hermeneutic 
strategies” (Sa Pereira 2019: 407), and even as a shaping element of “the episte- 
mic cultures of the Digital Humanities” (Kleymann 2022). Martin Mueller, a re- 
nowned Shakespeare scholar and classical philologist, has been one of the 
strongest proponents of such integrative approaches for the analysis and in- 
terpretation of literature over the last ten years. In reaction to Franco Moret- 
ti’s controversially discussed idea of distant reading, Mueller coined the term 
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scalable reading.! For Mueller, scalable reading is meant to be a “happy syn- 
thesis" of qualitative and quantitative methods and does not aim to overcome 
reading *by bigger or better things" (Mueller 2012). Instead, he focuses on the 
notion of how digital tools and methods might create new ways to mediate be- 
tween text and context. As Thomas Weitin has repeatedly pointed out, scalable 
reading represents a conceptual attempt to practically overcome the gap be- 
tween single-text readings and the analysis of larger quantities of literary 
texts (cf. Weitin 2017: 1-2; Weitin 2015: 9-14). In Mueller's case then, scaling is 
not predominantly a question of choice between big or small data, i.e., corpus 
analyses or single text readings. The operation of scaling is rather presented 
as a methodological challenge: how can literary scholars analyze texts, text 
segments or text corpora from different points of view? How can they incorpo- 
rate both quantitative and qualitative methods? How can they, rather than 
only compare them, fruitfully combine the results deriving from these differ- 
ently scaled operations? 

In this article, I will reconstruct Mueller's theoretical conceptualization of 
scalable reading and discuss the practical implications it can have on computa- 
tional literary studies. In a first step, I will contextualize the origin of Mueller's 
concept, for scalable reading emerged in response to a methodological debate on 
the challenges of literary history. (2) I will then focus on the metaphor of zooming 
that led Mueller to the term scalable reading. Why has he used the metaphors of 
zooming and, subsequently, scaling? What do they bring to the table? (3) This will 
be followed by a deeper analysis of scalable reading where I structure the concept 
in four different dimensions that are important to understand Mueller's idea of 
scaling. (4) Finally, I will examine the practical consequences that scalable read- 
ing can have when analyzing literature. For this, I will concentrate on different 
forms of literary network analysis of German plays. (5, 6) Thereby, I want to illus- 
trate how qualitative and quantitative methods in literary studies can be com- 
bined in a productive way. 


2 The challenges of literary history 


In his essay *Patterns and Interpretation" (2017), Franco Moretti claims that digitiza- 
tion has completely altered the literary archive, as it has made it possible to simul- 


1 Complementary to Mueller's term, Johannes Pause and Niels-Oliver Walkowski propose the 
term scalable viewing for an examination of media products that have an aesthetic dimension. 
Cf. Pause and Walkowski (2019). 


The Scales of (Computational) Literary Studies — 263 


taneously study large numbers of literary texts. According to Moretti, digitization 
not only enables but also calls for new scales of analysis, as it alters the way in 
which we should deal with the objects of study: “when we work on 200,000 novels 
instead of 200, we are not doing the same thing, 1,000 times bigger; we are doing a 
different thing. The new scale changes our relationship to our object and, in fact it 
changes the object itself" (Moretti 2017: 1). In light of text collections containing 
tens or even hundreds of thousands of literary texts, for Moretti, “[r]eading ‘more’ 
seems hardly to be the solution" (Moretti 2000b: 55). In the realm of these massive 
collections, close reading approaches, on the contrary, appear to be *totally inap- 
propriate as a method of studying literary history", as Matthew Jockers has argued 
in his book Macroanalysis (2013: 7).? Instead, Jockers proposed to adopt a position 
that he calls macroanalysis and is *primarily quantitative, primarily empirical, 
and almost entirely dependent upon computation" (Jockers 2013: 32).? 

Approaches such as those outlined by Moretti and Jockers have recently reig- 
nited a methodological debate on how to appropriately investigate literary history. 
Scholars regularly account for three fundamental reasons for this circumstance: 
The debate is, firstly, sparked by a theoretical discussion on canon, canonicity and 
its implications for our understanding of theoretical concepts and literary history 
alike, which dates back to the 1970s (cf. Jannidis 2015: 659; Bode 2012: 8-9; Moretti 
2020a). Robert Darnton described the canon of literary history as “an artifice, 
pieced together over many generations [that] bears little relation to the actual expe- 
rience of literature in a given time" (Darnton 1989: 17.5 Secondly, it is based on the 
promise of empirical rigor in text analysis (cf. Jannidis 2015: 659; Heuser and Le- 
Khac 2011: 80; Willand 2017: 80). Advocates of quantitative methods argue that they 
bring a more scientific approach to literary studies, as they question or validate 
findings based on a larger number of texts (cf. Jockers 2013: 5-8) This has often- 
times been described as a bird's-eye view (cf. Mueller 2014: S 31; Willand 2017: 93; 
Jockers 2013: 19) Thirdly, this assumption seems to rest on the continuously increas- 
ing computing power, and on the ongoing digitization of literary texts. As Fotis Jan- 


2 For more on Jocker’s idea to do macroanalysis and how it compares to Moretti’s approach see 
Krautter and Willand 2020: 83-87. 

3 Contrarily to this, Jockers (2013: 26) also repeatedly calls for a “blended approach” that com- 
bines the benefits of both, small- and large-scale studies: “It is exactly this sort of unification, of 
the macro and micro scales, that promises a new, enhanced, and better understanding of the lit- 
erary record. The two scales of analysis work in tandem and inform each other”. 

4 There is, though, a long-standing tradition using quantitative methods in literary studies. Cf. 
Bernhart 2018. 

5 The idea of empirical rigor in text analysis, however, has also led to strong objections, as it 
encourages “the idea that computer-aided approaches to texts reduce literary works to formal- 
istically describable and objectively countable objects”. Gius and Jacke 2022: 2. 
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nidis (2013: 34) puts it, the computer as a “number cruncher” is destined for quanti- 
tative techniques of text analysis. 

Much of the ongoing debate dates back or relates to Franco Moretti's contro- 
versially discussed concept of distant reading, which he first proposed in his 
essay “Conjectures on world literature" (Moretti 2000b: 56-58). In his paper, Mor- 
etti famously — and polemically — described distant reading as a “little pact with 
the devil" (Morretti 2000b: 57). By referring to what Margaret Cohen coined the 
*great unread" (Cohen 1999: 23; Cohen 2009: 59; cf. Moretti 2000a: 225-227), i.e., 
virtually forgotten literature that can only be traced in archives, Moretti argues 
that literary history is not limited to its canonical fracture and should not be ex- 
trapolated from a few exemplary texts (cf. Moretti 2000a: 207—209). He postulates, 
for instance, that the canon of nineteenth-century British novels consists *of less 
than one per cent of the novels that were actually published" (Moretti 2003: 67). 
His own ambitious endeavor, namely doing world literature, rests on his materi- 
alistic view on literary history (cf. Moretti 2006: 84; Moretti 2013: 121). To answer 
his research question, he therefore pursues to incorporate the largest possible 
corpus. Aiming to overcome the limitations of and flaws in literary history that 
he himself perceives, Moretti wants to shift the “focus from exceptional texts to 
“the large mass of [literary] facts” (Moretti 2003: 67). 

To cope with this enormous amount of literature, Moretti proposes to do 
*'second-hand' criticism" (Moretti 2000b: 61, FN 18). In his opinion, literary history 
needs to become *a patchwork of other people's research" (Moretti 2000b: 57) 
that does no longer rely on close reading literature: 


[T]he trouble with close reading (in all of its incarnations, from the new criticism to decon- 
struction) is that it necessarily depends on an extremely small canon. This may have become 
an unconscious and invisible premiss by now, but it is an iron one nonetheless: you invest so 
much in individual texts only if you think that very few of them really matter. Otherwise, it 
doesn't make sense. And if you want to look beyond the canon [. . .] close reading will not do 
it. It’s not designed to do it, it’s designed to do the opposite. (Moretti 2000b: 57) 


Initially, he based his suggestion of distant reading, which he later claimed to have 
been a "fatal formula" that was partly *meant as a joke" (Moretti 2013: 44), on two 
methodological ideas: firstly, reading secondary literature instead of primary litera- 
ture, i.e., analyzing other researcher's analyses and blanking out primary sources; 
and, secondly, activating research networks with experts on different languages, 
epochs and genres (cf. Moretti 2000b: 58-60). In the end, his idea of world literature 
depends on bringing together as much knowledge on as many literary works as 
possible. His focus deliberately shifts from understanding and interpreting a single 
literary text — in Moretti's view this is a task where close reading excels — to detect- 
ing patterns in large literary corpora. For Moretti, these patterns have the power to 
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explain literary conventions and their historical development: “conventions matter 
for us much more than any individual text: they matter as such” (Moretti 2017: 7). 
Doing distant reading, “the reality of the text undergoes a process of deliberate re- 
duction and abstraction”, where a bigger picture emerges, states Moretti (2005: 1). 
In this sense, distant reading “allows you to focus on units that are much smaller 
or much larger than the text: devices, themes, tropes — or genres and systems” 
(Moretti 2000b: 57). 

When Moretti introduced the term distant reading in 2000, he neither re- 
ferred to computational approaches, nor specifically mentioned quantitative 
methods to analyze literature. A first connection to quantitative methods was 
drawn by Moretti three years later, in an essay called *Graphs, Maps, Trees" 
(2003). There, he suggests incorporating abstract models imported from theories 
by other disciplines, namely quantitative history, geography and evolutionary 
theory (cf. Moretti 2003: 67). Since then, the connection from distant reading to 
computational methods has been deepened - mainly from 2010 onwards, when 
the Stanford Literary Lab was formed (cf. Underwood 2017: S 29-35).° Nowadays, 
distant reading has become a dictum for quantitative corpus analyses (cf. Under- 
wood 2017: S 7-11; Weitin et al. 2016: 104). In the last 20 years, however, Moretti’s 
polemical term, his trenchant provocations (cf. English 2010: xiv), and his prefer- 
ence for abstract models have called many critical voices. His critics — and critics 
of quantitative approaches in general - either object to the principle of analyzing 
art by computation (cf. Lamping 2016), the lack of innovative and new insights 
compared with the massive efforts one must put in (cf. Schulz 2011), or they object 
the exploratory value of computational methods, by judging them to be nothing 
but “mere play” (Fish 2012). 

Meanwhile, the debate has shifted its direction, at least partly. Instead of 
choosing sides between close and distant reading, a discussion on how to mean- 
ingfully combine small- and large-scale research in literary studies has emerged 
(e.g., Weitin 2017: 1-2). This has been accompanied by a discussion on how to in- 
terweave qualitative and quantitative methods of text analysis and, subsequently, 
how to combine interpretative readings with statistical evaluations (e.g., Willand 
2017: 77-93). For this purpose, the term mixed methods has been partly adopted 
from social sciences (e.g., Herrmann 2017). Moacir de Sa Pereira (2019: 405), for 
instance, states that “a conscious use of mixed methods would push aside the bina- 
rism and let literary study be a counting discipline at the same time as a reading 


6 Peer Trilcke and Frank Fischer (2016: 13) point out that in his recent publications Moretti no 
longer uses the term distant reading. Instead, he refers to computational criticism or simply digi- 
tal humanities. 
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one”. Mueller (2012) argued similarly when he criticized distant reading for not 
“express[ing] adequately the powers that new technologies bring to the old busi- 
ness of reading”. In Mueller’s view, the term “implicitly set[s] ‘the digital’ into an 
unwelcome opposition to some other”. 

In the following, I will elaborate on Mueller’s own conceptualization of mix- 
ing methods that goes by the term scalable reading.” 


3 From Google Earth to scalable reading 


“The charms of Google Earth led me to the term Scalable Reading as a happy syn- 
thesis of ‘close’ and ‘distant’ reading", said Martin Mueller (2012) in one of his pro- 
grammatic blog posts entitled Scalable Reading. Mueller's enthusiasm for Google 
Earth is tied to one specific operation: zooming. In his opinion, zooming facilitates 
exploration. Just a few clicks on the digital map enable a change of perspective 
from small details of individual streets to a global view of the world and vice versa. 
By zooming out, streets can be seen in context of the entire city, and the city, in 
turn, in context of the state or the country. By zooming in, instances like mountains 
or rivers can lose their structuring significance, while finer details emerge. Chang- 
ing scale and thus changing perspectives becomes a matter of a few clicks: *you 
can zoom in and out of things and discover that different properties of phenomena 
are revealed by looking at them from different distances" (Mueller 2012). 

In contrast to Google Earth, where zooming in and out is easy to operate, the 
corresponding operation comes with much greater intellectual demands when ap- 
plied to literary texts: for instance, a literary analysis that is based on a single text 
passage (zooming in) cannot be hermeneutically interpreted in the context of the 
author's oeuvre (zooming out) without considering co-texts and contexts (cf. Kraut- 
ter and Willand 2020: 78-79). Hence, the attempt to metaphorically zoom out of a 
literary analysis means that scholars need to read (many) other texts by the author 
and her or his contemporaries. Only then, it seems possible for compositional prin- 
ciples and patterns to emerge, i.e., to reach a higher level of abstraction. By inte- 
grating and scaling different text analytical methods, Mueller, however, identifies 
an opportunity to transfer the motion of zooming from Google Earth to literary 
texts in a direct manner. In his case, scaling is neither a question of corpus selec- 
tion, nor is it a simple choice between big and small data or some sort of an in- 
between level. Mueller rather presents the operation of scaling as a methodological 


7 Chapters 3 and 4 of this article are an adaptation of a more detailed study that is part of my 
forthcoming dissertation. 
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challenge: how can literary scholars analyze texts, text segments or text corpora 
from different points of view, i.e., with quantitative and qualitative methods alike? 

Mueller illustrates these efforts by using the example of medieval Bible con- 
cordances. For him, the concordances act as textual equivalent to Google Earth. 
By compiling alphabetical word lists and identifying the corresponding text pas- 
sages the monks facilitated zooming, according to Mueller. With the help of the 
now accessible parallel texts, it had become possible to systematically analyze 
and understand individual passages in the context of a bigger picture, i.e., other 
analogous passages. The monks assumed that the reorganization of the Bible into 
word lists would help them with the exegesis of God's word and, in consequence, 
to better understand the order of God's world. Thus, the text-internal understand- 
ing of the Bible can be transferred to the text-external world. Mueller recognized 
this analogy in Google Earth. As a user of Google Earth, one expects that the scal- 
able map has an orientation function for the reality represented. When Google 
guides the user to the next train station by means of its digital map, the user does 
expect to arrive at this train station in reality. 

Why is this methodical association of Google Earth and Bible concordances 
relevant for Mueller's idea of scalable reading? For Mueller, the concordances 
can be understood as a hinge that connects the idea of zooming in Google Earth 
with scaling in computational literary analysis: “Strip a fancy text retrieval sys- 
tem to its basic operations, and you find a concordance on steroids, a complex 
machine for transforming sequential texts into inventories of their parts that can 
be retrieved and manipulated very fast" (Mueller 2012). He argues that the basic 
functional principle of modern tools for computational text retrieval are still com- 
parable with concordances. They support you in finding specific text elements 
and by doing so, they help to accelerate *forms of non-contiguous reading" (Muel- 
ler 2014: S 9). Even the difficulties that arise from using these tools seem similar 
to those the monks experienced nearly 800 years ago. Algorithms can identify sta- 
tistical patterns in a data set, but they cannot attribute meaning to it. Just as the 
monks had to compare and interpret the parallel texts listed in the concordance, 
the identified patterns in a computational analysis must be contextualized and 
understood.? Concordances and digital tools alike, thus, change the way in which 
we deal with the objects of study incrementally, i.e., in small steps.’ As an exam- 
ple for this stepwise transformation, Mueller reflects on the usage of surrogates 
in literary studies. *Our typical encounter with a text is through a surrogate — set- 


8 Mueller (2012) has called this the “last mile problem' of human understanding." 
9 Here, Mueller (2013) refers to a report, which Douglas Engelbart prepared for the Air Force 
Office of Scientific Research. 
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ting aside whether there is an original in the first place,” writes Mueller (2013). 
Surrogates, then, represent the textual basis for reading, analyzing and interpret- 
ing literature. However, they are not congruent to their original form of publica- 
tion. Mueller argues, for instance, that the script of a play, most of the times, is 
“meant to be seen and heard in a live theatre rather than read. But however pal- 
try a surrogate the printed text may be, for some purposes it is superior to the 
‘original’ that it replaces” (Mueller 2005). 

Mueller (2014, passim) illustrates that every surrogate has its own “query po- 
tential,” which explains the coexistence of different surrogates referring to the 
same original literary work.!° In consequence, reading Shakespeare's plays in Bev- 
ington, Riverside or Norton editions already classifies the reader as “distant reader” 
(Weitin 2015: 9; cf. Weitin et al. 2016: 115). For the reader no longer works on the 
original, but with a representation of the original. For Mueller, digital machine- 
readable texts are just one step further into “the allographic journey of text[]" rep- 
resentations (Mueller 2014: S 10). He argues that they possess a *second-order 
query potential" comparable to Bible concordances (Mueller 2008: 288). Thus, they 
can be transformed, analyzed, or evaluated in ways, which would either require an 
enormous effort or do not seem possible for printed texts at all.” An obvious exam- 
ple that comes to mind are calculations based on bag-of-words" representations, as 
amongst others John Burrows (2002) has used for his stylometric studies on author- 
ship attribution. 


4 The different dimensions of scaling 
in literary analyses 


It is already clear that scales in Mueller's metaphorical conception of scalable 
reading extend along several dimensions. In the following section, I argue that 


10 It is not always clear what should be considered the original work (cf. Grubmüller and Wei- 
mar 2017: 415-416). This does not, however, compromise Mueller's argument with respect to digi- 
tal analyses. 

11 Stephen Ramsay (2013: 490) argues similarly: *It is one thing to notice patterns of vocabulary, 
variations in line length, or images of darkness and light; it is another thing to employ a machine 
that can unerringly discover every instance of such features across a massive corpus of literary 
texts and then present those features in a visual format entirely foreign to the original organiza- 
tion in which these features appear. Or rather, it is the same thing at a different scale and with 
expanded powers of observation". 

12 In a bag-of-words model, a given text or a corpus of texts are represented as the multiset of 
their word frequency. A bag-of-words model disregards word order and syntax. 
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four different dimensions are important to practitioners reflecting on their own 
approaches when integrating qualitative and quantitative methods to analyze 
literature. 

As indicated, the textual basis of literature is, firstly, available in a broad 
scale of surrogates, which oftentimes coexist (cf. Weitin et al. 2016: 115). There are 
oral texts, manuscripts, contemporary publications, complete works, historical- 
critical editions, digitized texts, specifically encoded text corpora, born digitals 
and a lot more. Furthermore, one can manipulate these surrogates, whether they 
come in analogue or in digital form. One can focus on specific segments, prepare 
concordances and word frequency lists or concentrate on linguistic features such 
as parts of speech. Remodeling surrogates in such ways, however, changes the ep- 
istemic object (cf. Trilcke and Fischer 2018). We are then no longer analyzing the 
literary text as a whole, but an abstract representation of it, which might solely 
persist of relations between literary characters, as in the case of character config- 
urations and the resulting networks." 

The most obvious scale, secondly, is probably the question of scope, i.e., the 
size of the object of study. Is the goal of a study to examine a single literary text, 
perhaps even an extract of a single text? Or is it to investigate a larger number of 
texts? How extensive does this corpus then turn out to be? The quantitative scope 
is additionally related to the heterogeneity of the chosen literary texts (cf. Gius 
2019: 8-9). In which languages are the texts written? Do they belong to the same 
genre? Which literary period do they originate from? 

The scope of investigation, thirdly, impacts the units of analysis. As stated by 
Moretti, investigations that focus on a large corpus of texts allow us to analyze 
units that are smaller or bigger than the text. By doing distant reading or macro- 
analysis the units thus deviate from the typical meso-scale of literary studies, as 
Carlos Spoerhase (2020: 6-7) has pointed out. The goal is no longer to understand 
a single literary text. Instead, large-scale corpus analyses emphazise infratextual 
or supratextual units, such as the distribution of individual word forms or the 
diachronic development of genres. For Matthew Jockers (2013: 22), “the results of 
macroscopic text-mining” are able to “aggregate[] a number of relatively small de- 
tails into a more global perspective”. Unlike close reading, which is more or less 
bound to the meso-scale, Jockers (2015) argues that macroanalysis empowers “[s] 
cale hopping". In contrast to close reading, macroanalysis would “allow for both 


13 Of course, literary network analysis is not limited to the analysis of character relations. Ma- 
ciej Eder (2017: 50—64), for instance, has used networks to visualize stylometric differences of dif- 
ferent authors. 
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zooming in and zooming out” (Jockers 2013: 23). For Spoerhase, however, it is ob- 
vious that in the humanities, when scaling the research corpus and thus the re- 
search question, scholars implicitly preselect the units they observe, interpret 
and evaluate (cf. Spoerhase 2020: 8). Accordingly, a large corpus of literary texts 
not only enables the scholar to look at supra- and infratextual units, it rather be- 
comes a necessity. 

Just like Moretti and Jockers, Mueller emphasizes that quantitative analyses 
allow for both zooming out and zooming in: “Digital tools and methods certainly 
let you zoom out, but they also let you zoom in, and their most distinctive power 
resides precisely in the ease with which you can change your perspective from a 
bird’s eye view to close-up analysis” (Mueller 2014: $ 31). Mueller, though, does 
not tie the metaphor of zooming to the size of the corpus, i.e., the number of liter- 
ary texts to be investigated. For him, zooming neither excludes the meso-scale, 
nor is it a one-off operation, where scholars once set a specific scale that deter- 
mines the perspective of their study. His conception, fourthly, rather considers 
analytical methods themselves to be scalable. As Thomas Weitin has pointed out, 
Mueller's idea of scalable reading includes all acts of reading and analysis. Thus, 
depending on the research question, various qualitative and quantitative meth- 
ods can coexist equally and can be brought together in a combinatorial way. Un- 
like the binary terms close and distant reading or micro- and macroanalysis 
imply Mueller's concept highlights that the purposes of either qualitative or quan- 
titative methods are not determined a priori. To know which method works best 
is not (only) a question of corpus size. As I have shown, Mueller's concept of scal- 
able reading challenges this simplistic assumption and advocates for a position 
that takes into account all scales of analysis. 


5 A practice of scalable reading? 


In practice, however, scholars have used scalable reading rather as a catchphrase 
to oppose Moretti's polemical dichotomy than as a methodological approach (cf. 
Weitin 2017: 1-6; Weitin 2015: 9-14). In the following section, I try to highlight 
the intricacies attached to a practice of scalable reading within the digital human- 
ities in regard to the four dimensions I have laid out. For illustration purposes, I 
have chosen to focus on one of the most prominent methods in digital humanities, 
namely network analysis (cf. Jannidis 2017b: 147). The idea of employing network 


14 There are of course some exceptions, e.g., Horstmann and Kleymann 2019. 


The Scales of (Computational) Literary Studies — 271 


analysis on literary texts has been imported from the empirical social sciences, 
where networks are used to model social relations (cf. Trilcke 2013: 201). In 
computational literary studies, network analysis is used for different purposes: to 
analyze character networks in a corpus of plays,” short stories (cf. Jannidis 
2017a), or novels (cf. Rochat 2015), to differentiate between literary genres (cf. 
Evalyn et al. 2018; Hettinger et al. 2016), or to identify main characters (cf. Fischer 
et al. 2018; Jannidis et al. 2016). Graph theory supplies the mathematical basis of 
network analysis. It operates with two constitutive components, so-called nodes 
(or vertices) and edges. In its simplest shape, a network can be described as “a 
collection of points joined together in pairs by lines” (Newman 2010: 1), wherein 
the points are called nodes and the lines are called edges. Nodes that are con- 
nected by edges relate to each other. The type of relationship depends on the enti- 
ties that are represented by the nodes and on the form of interaction the edges 
describe. One might think of characters talking to each other or, going beyond 
literary texts, of authors corresponding with each other. I will focus on two differ- 
ent kinds of interactions of characters in German plays that result in different 
kinds of networks: co-presence networks and coreference networks. Doing so, the 
networks become an abstract representation of a very specific trait of the plays, 
depending on how the interaction between characters, i.e., the edges connecting 
the nodes, is precisely defined. 

Figure 1 shows a character network of Friedrich Schiller’s play Die Räuber 
(1781). In this network, each node depicts one of the play’s characters. The edges 
connecting the nodes display that “two characters are listed as speakers within a 
given segment of a text (usually a ‘scene’)” (Trilcke et al. 2015: 1). Networks like 
this are called co-presence networks (cf. Trilcke 2013, passim) and account for a 
large part of literary network analysis of plays. They owe their popularity to 
mostly two reasons. Firstly, they build upon structuralist precursors and drama 
analytical insights of the 1960s and 1970s. Solomon Marcus’ concept of dramatic 
configuration, which soon became handbook knowledge (cf. Pfister 1988, 171-176; 
Asmuth 2016: 44-47), is particularly important for the theoretical justification of 
co-presence networks. Configuration, in this case, means “the section ofthe drama- 
tis personae that is present on stage at any particular point in the course of the 
play” (Pfister 1988: 171), and can be used for segmentation purposes. Marcus’ stud- 
ies on dramatic configuration have not only been praised as proto-network analyti- 
cal approaches to character relations in plays (cf. Trilcke 2013: 221), they also 
exemplify that quantifying the segmentation of a play with regard to changes in 
configuration has long been established. In this respect, Manfred Pfister has em- 


15 For one of the pioneering studies see Stiller et al. 2003. 
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phasized that “[e]very entrance or exit of one or more" characters is “a significant 
development on this level of segmentation” (Pfister 1988: 234). Secondly, co-presence 
networks can be automatically created if the plays are encoded accordingly. This 
allows the scholar to straightforwardly compute a large number of networks from 
different plays and compare them not only visually, but also with regard to different 
mathematical network measures. 

For my analyses, I use a corpus of plays stemming from the Drama Corpora 
Project, which currently consists of 16 different text collections (cf. Fischer et al. 
2019). As of now (September 19, 2023), the biggest collections are the French 
Drama Corpus (1560 plays), the German Drama Corpus (644 plays) and the Rus- 
sian Drama Corpus (212 plays). All plays collected in the Drama Corpora Project 
are encoded according to the same Text Encoding Initiative (TED format. Impor- 
tant structural information, such as the distinction between main and secondary 
text or the marking of act and scene boundaries, is therefore machine-readable. 
The following excerpt of Schiller’s Die Räuber gives insight into the arrangement 
of a TEI encoded play. The play is structured by different tags. <div type=“scene”>, 
for instance, marks a new scene, «sp who=“#[. . .]”> assigns the subsequent char- 
acter speech to an identifier. This information is essential for ensuing analyses, as 
it is possible to extract it deterministically from the encoded plays. 


Karl Moor 


X: 
Geh 


SOK Ki Amalia 


Grimm 
Schwarz 


Franz Moor 


Figure 1: Co-presence network of Schiller’s Die Räuber. 
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Text excerpt from Schiller’s Die Räuber encoded in TEI taken from the Drama Corpora 
Project. 

Ee 

<div type=“act”> 

<head>Erster Akt</head> 

<div type=“scene”> 

<head>Erste Szene</head> 

<stage>Franken. Saal im Moorischen Schloß.</stage> 
<stage>Franz. Der alte Moor.</stage> 

<sp who=“#franz_von_moor”> 

<speaker>FRANZ.</speaker> 

<p>Aber ist Euch auch wohl, Vater? Ihr seht so blaß.</p> 
</sp> 

<sp who=“#der_alte_moor”> 

<speaker>DER ALTE MOOR.</speaker> 

<p>Ganz wohl, mein Sohn - was hattest du mir zu sagen?</p> 
</sp> 

[5] 


All the information necessary to compute a co-presence network as seen in Figure 1 
is part of the TEI encoded plays. One can automatically extract the information re- 
quired, and, subsequently, represent it in the form of a matrix. Table 1 illustrates 
an excerpt of such a matrix for Die Rüuber. This matrix provides the basis for net- 
work visualization (Figure 1), and all subsequent calculations of network measures. 

However, the prospect of automating network creation causes some limitations 
in formalizing the interactions of characters. As I have outlined above, the interac- 
tions represented in Table 1 and Figure 1 refer to characters speaking in a particular 
segment of text. Here, these segments are specified as scenes. They are denoted as 
such by the play's secondary text. This operationalization of interaction, then, is sim- 
ilar to but not congruent with Marcus' understanding of configuration. A play's con- 
figuration changes whenever a character enters or exits the stage, i.e., whenever the 
dramatis personae on stage changes (cf. Marcus 1973: 315-319). In plays that follow 
the principles of French Classicism, entrances and exits of characters designate a 
new scene - at least in theory. Configuration and scene then coincide. Plays that are 
influenced by Shakespeare follow a different principle. In Shakespearian theatre, 
scenes are bound to a change of location (cf. Ranke 2010: 710). Therefore, several 
characters can enter or exit the stage without designating a new scene. As entrances 
and exits of characters are not (yet) encoded in the Drama Corpora Project, the auto- 
matic creation of co-presence networks relies on scenes as a segmentation. This can 
lead to negative side effects. Schiller's Die Räuber is a good example to outline the 
problem. Solomon Marcus has already pointed towards the significant discrepancy 
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between the number of configurations and the number of scenes in Die Räuber, as 
he counted 78 configurations in the course of the 15 scenes (cf. Marcus 1973: 
326-333). The matrix in Table 1 depicts whether a character speaks in a certain 
scene of the play. Looking at the matrix, it can easily be identified which characters 
speak in the same scene. Daniel and Kosinsky, for instance, both speak in Scene 11 
(IV, 3). According to the above-mentioned operationalization, this is sufficient to cre- 
ate an edge between the two characters in the network (see Figure 1). Kosinsky, 
however, only enters the stage after Daniel has already left it. Hence, Daniel and 
Kosinsky never actually meet on stage both of them talk to Karl Moor, but in sepa- 
rate instances. While in the first three acts the sphere of the family and the sphere 
of the robbers have been occupied by disjunctive groups of characters, Karl man- 
ages to cross this boundary in the fourth act. A network analysis that rests on scenes 
as segmentation shows Kosinsky to also cross this boundary. This error would miti- 
gate the special role Karl occupies (cf. Krautter and Willand 2021: 117—118). 


Table 1: Excerpt of the configuration matrix of Schiller's Die Räuber. A value of ‘1’ indicates that a 
character speaks at least once in the corresponding scene. 


Franz  Deralte Karl Amalia Spiegelberg Daniel Kosinsky 
Moor Moor Moor 

Scene 1 1 1 

Scene 2 1 1 

Scene 3 1 1 

Scene 4 1 

Scene 5 1 1 1 1 

Scene 11 1 1 1 

Scene 15 1 1 1 


Looking at Figure 1, one can see that the nodes and edges of the network carry 
some additional information stemming from the matrix. On the one hand, the 
edges are weighted, i.e., they do not only represent that the connected characters 
do speak in the same segment, but their strength also indicates the number of seg- 
ments in which both characters speak. A stronger edge signifies more segments. 
On the other hand, the size ofthe nodes corresponds to the so-called degree, i.e., a 
simple measure for centrality in a network. Centrality measures try to determine 
the importance of a node for certain constellations within the network structure. 
The degree of a node simply corresponds with the number of other nodes that 
share an edge with the node in question (cf. Newman 2020: 133-135). The bigger 
the node is pictured, the higher is its corresponding degree. 
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5.1 Analyzing individual co-presence networks 


Now that I have outlined the essential premises of literary network analysis in 
general and co-presence networks in particular, I would like to raise the question: 
How can they meaningfully be embedded in drama analysis? And how do net- 
works relate to the four dimensions of scalable reading? Franco Moretti's essay 
*Network Theory, Plot Analysis" (2011) has shown promising signs of integrating 
network visualizations into the hermeneutic process of understanding literature. 
Moretti's discussion of Shakespeare's Hamlet (1609) rests upon his manually cre- 
ated network visualization, wherein he identifies Horatio as an important charac- 
ter for the network's structure. Horatio “inhabits a part of the network where 
clustering is so low that, without him, it disintegrates" (Moretti 2011: 6). Similar to 
close reading, Moretti dissects the network into different individual parts and 
merges his network-analytical observation with broad text-analytical findings. Ac- 
cording to him, “Horatio has a function in the play, but not a motivation. [. . .] I 
can think of no other character that is so central to a Shakespeare play, and so 
flat in its style" (Moretti 2011: 7). 

Critics have accused Moretti of betraying his own distant reading agenda. An- 
alyzing Hamlet, Moretti's intuition seems to be more important for the interpreta- 
tion of the network than statistical evaluations. His vision to provide explanatory 
models based on large corpora that extend beyond understanding and interpret- 
ing individual canonical texts seems to fall flat (cf. Trilcke and Fischer 2016: 12; 
Prendergast 2005: 45). On the one hand, Moretti's writings oftentimes rest upon 
tentative thoughts and ideas, which intend to provide ample food for thought 
rather than presenting thoroughly tested and validated results. He himself admits 
that it was “less concepts than visualization" which he “took from network the- 
ory,” as a large-scale analysis was not yet feasible for him due to data-gathering 
reasons (Moretti 2011: 11). On the other hand, his understanding of the term dis- 
tant reading constantly changes. In the first few pages of his book Graphs, Maps, 
Trees (2005) Moretti, for instance, argues that distant reading is mainly a process 
“in which the reality of the text undergoes a process of deliberate reduction and 
abstraction" (Moretti 2005: 1); which is exactly what he did with his network anal- 
ysis of Hamlet. 

How does Moretti's study now fit to Mueller's concept of scalable reading? 
Looking at the four dimensions, the most striking change compared to a tradi- 
tional approach of reading and interpreting Shakespeare's Hamlet is the radical 
change of surrogate. Moretti no longer interprets the play's text, but its abstract 
representation as a network with nodes and edges. While the scope of his study is 
limited to just one play - the typical meso-scale of literary studies, the text repre- 
sentation, in turn, changes the units of analysis. His sole focus lies on the charac- 
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ters’ co-presence.'® In his own words, the analysis rests upon “the possibility of ex- 
tracting characters and interactions from a dramatic structure, and turning them 
into a set of signs” (Moretti 2011: 11). In other words, he combined quantitative and 
qualitative methods, firstly, in order to create, and secondly, to interpret the net- 
works. Is this, then, equivalent to what Mueller has referenced as scalable reading? 

Regarding that, it is the conclusion Moretti has drawn from his network analy- 
sis that is striking. The network itself and, subsequently, the characters' degree as 
well as their average distance (also called average path length) in the network” in 
his opinion demand for “a radical reconceptualization of characters and their hier- 
archy" (Moretti 2011: 5). For Moretti, established concepts in literary studies such as 
the protagonist of a play and the network analysis' findings can hardly be fruitfully 
combined. Abstract models serve an explanatory power that, according to him, does 
not blend in with “concepts of ‘consciousness’ and "interiority" (Moretti 2011: 4), 
which are common for literary studies. Mueller has criticized positions like these 
when he mentioned the ‘unwelcome opposition’ created by the term distant read- 
ing. While Moretti's insights mostly rest on intuitive interpretations of the network, 
he has no intention to integrate them into an interpretation of the play's text; the 
play’s text and the play's network have become different objects.? For Moretti, 
then, it does not seem desirable to combine network analysis with the meso-scale of 
literary studies. 

Looking at Figure 1 it becomes obvious that not all co-presence networks profit 
in the same way from the “intermediate’ epistemological status of visualization" that 
Moretti praises in his essay (Moretti 2011: 11). The network of Schiller's Die Räuber 
suffers from a lack of readability, which can be attributed to both the absence of the 
play’s temporal level and the big group of robbers in the centre of the network.” In 
consequence, the disjunct social spheres of robbers and family, which on a structural 
level are defining for the first three acts of the play, are not perceptible in the net- 
work visualization. On the contrary, it seems that the robber band with Schwarz, 
Grimm and Karl Moor is at the centre of the play, while the family, i.e., Franz, Der 
alte Moor and Amalia, have only a peripheral role. Not even Moretti, though, would 


16 Moretti's operationalization of co-presence differs a bit from the above mentioned as he cre- 
ated the networks manually. Therefore, he could choose to only connect two nodes when the 
characters in question talk to each other. 

17 Average distance (or average path length) is defined as the average number of steps needed 
from one node to travel along the shortest paths to all other nodes of the network. 

18 For a comparable position, see Trilcke and Fischer (2018: chap. 3). 

19 The network's density (0.52), however, is pretty similar to the average (0.48) and median 
(0.46) density of 201 plays with a comparable network size (between 15 and 35 characters) taken 
from the German Drama Corpus. A network's density is defined as the proportion of possible 
edges that are actually realised. Cf. Wassermann and Faust 1994: 101. 
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suggest that Schwarz and Grimm, the characters with the highest degree in the net- 
work, should be regarded as the protagonists of the play because of their centrality. 
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Figure 2: Selected characters from Die Räuber and the number of words they utter throughout the play. 


Looking at the number of words Schwarz (572) and Grimm (322) utter in the 
course of the play (see Figure 2), it becomes clear that their high degree values do 
not correlate with their impact on the play's action. Based on the numbers in Fig- 
ure 2, one would rather classify them as minor characters. Networks and their 
visualization should therefore always be contextualized and treated with caution 
in interpretation. 


5.2 Analyzing a bigger corpus of co-presence networks 


The real benefit of literary networks, however, has never been understood to be 
the analysis of single plays, but rather the investigation of a bigger corpus of 
plays (cf. Trilcke and Fischer 2018: chap. 3). In terms of the four dimensions of 
scalable reading, this corresponds to an upscaling of the object of study. The idea 
is to compare different plays that are modelled as networks based on data values, 
i.e., network measures such as the degree of a node. Then, it should be possible to 
detect patterns that reveal new insights into literary history and allow to test ex- 
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isting hypotheses quantitatively - at least this is the expectation. Figure 3% shows 
an example for a diachronic analysis based on 590 German plays (1730-1930) 
taken from the German Drama Corpus. I have taken the plays’ average degree in 
order to compute the mean values for every decade from 1730 until 1930 (black 
line). From a descriptive point of view, one can see that the average degree slowly 
rises from the later 18 century onwards. From around 1830 to 1880 only minor 
fluctuations are visible. Then, the average degree starts to increase. This is fol- 
lowed by a fairly pronounced drop and an equally noticeable rise. Peer Trilcke 
and Frank Fischer have interpreted these average degree values as an indicator 
for the fact that playwrights and their plays have started to react to social mod- 
ernization and differentiation starting from the second half of the 18" century 
onwards. They do, however, point out that this hypothesis is already well estab- 
lished in literary history (cf. Trilcke and Fischer 2018: chap. 4.1). 
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Figure 3: Number of characters (grey) and average degree (black) of 590 German plays. The figure 
shows mean values per decade. 


But how convincing are these numbers? Do the values really support the hy- 
pothesis? When looking at Figure 3, there seem to be some striking dependen- 
cies with regard to the plays' average degree and their number of characters. 
This becomes particularly clear in the decades between 1890 and 1930. To check 


20 Figure 3 reproduces an investigation of Trilcke and Fischer 2018, figure 6. See also Trilcke 
et al. 2015. 
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on this visual impression further, I have calculated the Spearman correlation p 
and the Pearson correlation r of several network metrics and the plays’ number 
of characters. The values of Table 2 do indeed confirm the visual perception. 
Across the computed network metrics, a relatively stable relation between the 
number of characters and the metrics themselves emerge. From a conceptual 
point of view, this makes sense: the more characters a drama has in total, the 
higher is the potential degree of each character.” Vice versa, as the number of 
characters in a play increases, the probability that they will speak to all other 
characters decreases. Hence, there is a negative correlation with the networks’ 
density. These observations seem to rest upon a poetological decision of the 
playwrights. Oftentimes, extending a play’s cast list goes hand in hand with a 
larger number of depicted locations, where only a limited number of characters 
have access to. In principle, the network metrics as well as the examination by 
Trilcke and Fischer mostly rest upon the changing numbers of the individual 
play’s characters. There is nothing wrong with this observation. To showcase 
these diachronic alterations in drama history, however, you hardly rely on net- 
work analysis.” 


Table 2: Correlation between the number of characters and several network 
metrics in 590 plays. 


@Degree Max degree? Density? Ø Path length 


Spearman’s p 0.77 0.97 -0.75 0.72 
Pearson’s r 0.41 0.76 -0.39 0.42 


As my previous remarks and analyses have shown, it is not only difficult to pro- 
ductively make use of co-presence networks for the typical meso-scale of literary 
studies. When zooming out, one also must contextualize the network metrics and 
critically reflect their values. Consequently, a scalable reading approach that al- 
lows for the scaling of all four dimensions and not only focuses on either “units 


21 When normalizing the degree according to a play’s number of characters, the effect is 
reversed. 

22 Boris Yarkho (2019), for instance, has distinguished between classicist and romantic plays by 
counting the number of characters that are speaking in the various scenes of a play. He then 
compared the distribution of speech acts in around 200 plays. 

23 Max degree only considers the play’s character with the highest degree value. 

24 The density of a network is equivalent to normalizing the network's degree according to its 
number of characters. This is said to ensure better comparability, as the values of the metric 
range from 0 to 1. Cf. Jannidis 2017b: 153. 
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that are much smaller” than the text or units that are “much larger” seems hard 
to achieve when focusing on co-presence networks. Instead, co-presence networks 
seem to be better suited for a reading at scale approach.2 As Weitin postulates, 
reading at scale leaves aside the idea of adjusting scales. Alternatively, it takes 
into consideration the “trade-off” that different scales of analysis entail (Weitin 
2021: 116). In his view, doing a quantitative corpus analysis means to decide for a 
certain scope and a specific surrogate. Different text representations and methods 
have different advantages and disadvantage, e.g., their comparability or their 
context sensitivity (cf. Weitin 2021: 116). Therefore, they lead to different research 
questions and answers. The four scales of scalable reading are each set to a spe- 
cific value: (1) choosing a specific surrogate; (2) selecting a fixed number of ob- 
jects to study; (3) settling for a specific unit of analysis; and (4) focusing on a 
single text analytical method. In the case I have outlined above this corresponds 
to: the co-presence of literary characters (1); 590 German plays (2); the dramatic 
structure as an indicator of social modernization (3); and network analysis (4). 


6 Outlook: Coreference networks 
as an alternative? 


From the perspective of network analysis, Mueller’s concept of scalability seems to 
fall flat — at least when looking at co-presence networks. Can we conclude, then, that 
network analysis in general does not allow for a scalable reading in Mueller’s sense? 
What will happen if we look beyond co-presence networks? As I have indicated ear- 
lier, there are other possibilities to model the interaction of two literary characters. 
One possibility, for instance, is to look at coreference chains.2 The linguistic concept 
of coreference refers to two or more expressions that have the same reference, i.e., 
the expressions point to the same character or object (cf. Crystal 2008: 116-117). 
Lately, the automatic resolution of coreference chains has received a great amount 
of attention within natural language processing (cf. Pagel and Reiter 2020: 56). For 
the analysis of (German) literary texts, however, the performance of automatic 


25 For the idea to differentiate scalable reading to reading at scale, see Weitin 2021, 116-145. Wei- 
tin exemplifies his idea by applying Topic Modeling to a corpus of novels. 

26 There are other, equally promising ideas, to map forms of interaction onto the edges. Michael 
Vauth (2019), for instance, directs his attention to narrative elements in Kleist's plays. Andresen 
et al. (2022) make use of knowledge dissemination in plays to create character networks. 
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coreference resolution requires further improvements." For my indicative analysis 
and the creation of coreference based character networks, I will therefore make use 
of a manually annotated version of Schiller’s Die Räuber.” 

Figure 4 visualizes two networks based on these coreference annotations. The 
networks focus on the four main characters of Die Rüuber, namely the brothers Karl 
and Franz Moor, their father (Der alte Moor) and Amalia. In these two networks, the 
annotated coreferences are used to model the interaction of the characters. Doing so, 
the perspective changes from who is talking with whom to who is talking about 
whom. Each time one of the characters is mentioned in a speech act, a new edge 
between the two characters will be added. Unlike the co-presence networks, the 
edges of the coreference networks have a direction to display who is talking and 
who she or he is referring to. Figure 4 depicts two networks whereof one network 
concentrates on the play's fifth act (left), while the other network displays the whole 
play (right). When looking at the coreferences in the character speech of Franz Moor 
one can identify some instructive patterns. Compared to the other three characters 
in the network, Franz's utterances in the course of the play feature lots of references, 
especially targeting his father (276 references) and his brother Karl (314 references). 
This is not surprising as his father and his brother are omnipresent in his mono- 
logues and dialogues. When comparing the fifth act to the whole play, the references, 
however, diminish. Only three references are targeting Karl, only four his father. His 
utterances, then, become much more self-centered. His thought process no longer re- 
volves around the perceived injustice of nature he had previously projected onto 
other characters. Dreaming of judgment day, his own horrible intrigues collapse and 
Franz commits suicide.” 

I will not go into further detail on how to fully interpret the coreference net- 
works of Die Räuber. Instead, I will give an outlook on how a more tailored and 
reflected modeling of literary networks might prove fruitful not only for Muel- 
ler's vision of scalability, but also for a greater relevance of quantitative methods 
to literary studies. As I have indicated, the coreference networks of Die Räuber 
yield a promising perspective for various scales of observation. Although the 
units of analysis, i.e., the coreference chains, are infratextual, certain patterns 
emerge from them that allow for inferences on a bigger scale. The coreference 


27 Fynn Schróder, Hans Ole Hatzel, and Chris Biemann (2021) report an F-score of about 0.65 for 
the coreference resolution in German novels. 

28 Pagel and Reiter have published a corpus of plays with manually annotated coreferences, in- 
cluding Schillers Die Rüuber. For more details on the annotation process and the annotated data, 
cf. Pagel and Reiter 2020: 56-60. 

29 For a more detailed analysis and interpretation of this, see Krautter and Willand 2021: 
128-131. 
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Figure 4: Coreference networks of Schiller's Die Räuber for the fifth act (left) and the whole play (right). 


values, for instance, can be used to further characterize the relationship between 
Karl and Franz.” As it is possible to contextualize and interpret the values in light of 
the play's plot, they also seem to support the typical meso-scale of literary studies — 
at least in principle. Moreover, the two coreference networks that I have compared 
provide a clear change of state that warrants follow-up research. One can easily 
change both the text representation and the method of analysis to supplement the 
networks, e.g., by re-reading the relevant passages, or by comparing the corre- 
sponding segments stylometrically or with the help of topic modeling. More chal- 
lenging, however, is the investigation of bigger corpora. Manually annotating every 
single play of a large text collection is not feasible. Once the automatic resolution of 
coreferences is further improved, this form of network analysis could function as a 
prime example of what Martin Mueller had in mind when he was trying to advo- 
cate a rethinking of the scalable dimensions of (computational) literary studies. 
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Florentina Armaselu 
® Text, Fractal Dust and Informational 
Granularity: A Study of Scale 


Abstract: This chapter proposes a method of text analysis that combines concep- 
tual aspects from the model of scalable or zoomable text (z-text), topic modelling 
and fractal geometry. It argues that this type of methodology may assist in detecting 
different levels of generality and specificity in texts and reveal some characteristics 
of the assemblage of blocks of text, above the word level, at different scales of re- 
presentation. Applications of such an approach can range from hermeneutics and 
discourse analysis to text (and possibly z-text) generation and summarization. 


Keywords: scalable text, topic modelling, fractal geometry, informational granu- 
larity, digital hermeneutics 


1 Introduction 


Scale in text analysis has often been considered in relation to big collections of 
data and the possibility offered by digital methods and tools to provide insights 
into patterns, trends, outliers and linguistic phenomena that are hard to detect 
and cover by human reading alone. Several terms, such as distant reading, some- 
times opposed or compared to close reading (Moretti 2013; Underwood 2019), scal- 
able reading (Mueller 2014), macroscope (Hitchcock 2014) and long zoom (Johnson 
2007), have been coined to define this type of approach that allows for shifts from 
a bird’s eye view to individual details. However, what seems to have been less 
studied so far is the significance of the concept of scale and its possible applica- 
tions as an inherent feature of text itself. Under the magnifying glass, a text is far 
from being a flat conceptual structure; it may reveal a stratified organization 
with different layers of general and specific, abstract and concrete, simple and 
complex units of meaning. 

This chapter will focus on scale in textual forms, starting from the assump- 
tion that a text can be conceived as a scalable construct containing different lev- 
els of detail and can be explored by zooming in and zooming out (Armaselu 2010; 
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Armaselu and Van den Heuvel 2017). It will investigate the possibility of com- 
puter-based detection and analysis of scale-related structures in text, as well as 
the potential meaning attached to these scales and forms of interpretation. 

The analysis will combine topic modelling (Blei 2011), for the preparation of 
the data, with the theory of fractals that is known for its applications in various 
domains, from mathematics and physics to statistical economics and linguistics. 
In his book The Fractal Geometry of Nature, Mandelbrot (1983) describes the con- 
cept of fractal, derived from the Latin fractus, frangere (to break), and its use in 
modelling highly irregular and complex forms from nature such as coastlines, 
clouds, mountains and trees, whose study goes beyond standard Euclidean geom- 
etry and dimensions. One of the fractal forms utilized as a model of a coastline at 
various scales is the Koch curve (Figure 1). The process of generating such forms 
starts with a straight interval called the initiator. A second approximation repla- 
ces the straight line with a broken line formed of “four intervals of equal length”, 
called the generator. New details such as promontories and subpromontories ap- 
pear through the iterative replacement of the generator’s four intervals by a re- 
duced generator. Although its irregularity is too systematic, the Koch curve is 
considered to be a “suggestive approximation” of a coastline (Mandelbrot 1983: 
34—35, 42—45). 


md Te 
PITE c 
ENT Figure 1: The first four iterations in building a Koch curve 


b (adaptation of Sapoval 1989, 19): a - first iteration, 
initiator, b - second iteration, generator; c, d - third and 
a fourth iteration. 


Applications of fractal theory in text analysis have exploited various aspects of 
the concept of scaling and self-similarity. In scaling systems, a part is in “some 
way a reduced-scale version of the whole”, and when “each piece of a shape is 
geometrically similar to a whole, both the shape and the cascade that generate it 
are called self-similar” (Mandelbrot 1983: 345, 34). Elaborating on Zipfs law (Zipf 
2012: 23-27), stating the relationship between word frequencies and their ranks as 
evidence of vocabulary balance in a text, Mandelbrot illustrated how constructs 
such as “scaling lexicographical trees” provide generalized proofs of the Zipf law 
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while exhibiting scaling properties and fractal, non-integer dimensions (Mandel- 
brot 1983: 346). Other studies have applied fractal-based methodologies to auto- 
matic keyword extraction by assessing the “degree of fractality” of words as a 
measure of their relevance and non-uniform versus uniform distribution in texts 
(Najafi and Darooneh 2015). Fractal indicators, computed by considering the se- 
quence of words and the number of letters in these words, have been used to 
compare the style of different types of texts, such as scientific, journalistic, conver- 
sational, epistolary and poetic (Kaminskiy et al. 2021). Fractal analysis has also been 
employed to determine the optimal number of topics based on the detection of 
“self-similar fractal regions” using a “density-of-states function” for texts in different 
languages (Ignatenko et al. 2019). More theoretical approaches have conceptualized 
language as a system that displays fractal features, such as “structural autosimilar- 
ity”, “fractal dimension” and “iterative order” in generating linguistic structures 
(Pareyon 2007) or “self-similar patterns” of discourse through the “process of identi- 
fying recursive semantic components” (Tacenko 2016). 

Although a variety of methods for the fractal-based processing or conceptual- 
ization of language have been proposed, mainly taking into account the composi- 
tion of texts as compounds including letters, syllables, words and sentences, the 
scalable nature of text and its stratified structure from a conceptual perspective 
has been less studied so far. In this chapter, I propose an approach that combines 
topic modelling techniques and fractal theory-related measures to detect different 
layers of generality and specificity and analyze a text at different scales. For this 
purpose, I use a corpus of texts from historiography, literature and philosophy. 
Section 2 will describe the initial assumptions about text scalability and the data 
used to test and assess the methods illustrated in Sections 3 and 4. Section 5 will 
present the results and possible interpretations of the approach, while Section 6 
will summarize the findings and propose hypotheses for future work. 


2 Datasets 


A particular area of research in digital history and humanities, that of global mi- 
crohistory (Trivellato 2011), has presented interest for the study. The dataset used 
in the experiments contains books considered representative for the objective of 
this type of research: 1688. A Global History (Wills 2001); Plumes (Stein 2008); The 
Inner Life of Empires (Rothschild 2011); The Two Princes of Calabar (Sparks 2004); 
and Vermeer’s Hat (Brook 2009). These books combine methods of analysis span- 
ning various conceptual levels, from micro to macro perspectives on the investi- 
gated historical phenomena, by connecting, for example: a series of paintings and 
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art objects with the growth of trade and exploration in the seventeenth century 
(Brook 2009); micro- and macro-histories through the history of a family's own 
connections (Rothschild 2011); micro-historical accounts with the history of en- 
slaved Africans in the early modern Atlantic world (Sparks 2004); or the perspec- 
tive of particular actors (people, commodities, one year in time) with the history 
of specific groups and cultures (Stein 2008; Wills 2001). For comparison purposes, 
one literary and one philosophical text were included in the dataset, Gulliver's 
Travels (Swift 2009) and Beyond Good and Evil (Nietzsche 2009), available via Proj- 
ect Gutenberg (Hart 2004). The size of the corpus was relatively small to allow for 
closer analysis of the methodology as a proof of concept. The main question to 
address was to what extent the applied digital methods were able to detect vari- 
ous conceptual levels in the studied texts. The historiography group of books was 
presumed to already possess such a variety given their analytical coverage rang- 
ing from broad overviews to detailed examination in their unfolding of argu- 
ments related to world history and microhistory. It was expected that the two 
other books, from literature and philosophy, would contain a certain type of strat- 
ification as well, as an inherent structure of text itself that would be revealed by 
the analysis. 

The books were divided into separate text files corresponding to chapters or 
parts (when chapters were too short), deemed as meaningful units of analysis for 
the exploration of scale in text. It was assumed that chapters and parts preserve a 
certain coherence and similarity in terms of varying degrees of generality and 
specificity in disclosing the content of the book. Figure 2 illustrates the structure 
of a book containing topics grouped on levels: from more general, representing a 
larger number of units, to more particular, mostly characterizing a single unit. 

Preliminary experiments consisted in reorganizing excerpts from the books 
as zoomable texts or z-texts' using a dedicated interface, z-editor? (Armaselu 
2010). The z-editor allows the user to start with a sequence of z-lexias? on the sur- 
face level and to expand or explore them by zooming in and out along the Z-axis 
and adding or revealing details that belong to deeper levels. I referred to the cor- 
responding processes of expansion and exploration of z-lexias by zooming in and 
out as z-writing and z-reading. Each level of the structure corresponds to an XML- 
TEI file that stores the content and relations of parent and children z-lexias. For 
the author or reader of a z-text, the inner XML mechanism of the interface is 
transparent. 


1 Accessed July 23, 2023. http://www.zoomimagine.com/AboutProject.html. 

2 Accessed July 27, 2023. http://www.zoomimagine.com/ZEditor.html. 

3 Fragments of texts as units in the writing or reading process, inspired by Barthes's (1974: 13) 
lexias, *units of reading". 
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Level 1 topics 


Level 2 topics 
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Level M topics 


Figure 2: Structure of a book organized horizontally (left to right) by units (e.g. chapters, parts) and 
vertically (top-down) by conceptual level (e.g. from general to specific). 


Figure 3 (left) shows a z-text constructed with fragments from the first chapter of 
Brook’s (2009) book and the result of successive zoom-ins on a fragment from a his- 
torical standpoint. The exercise implied a preliminary interpretation of the book as 
a stratified representation of meaning. For instance, the top level in the View from 
Delft chapter contains fragments that describe events from a world history perspec- 
tive, such as global cooling, plague and maritime trade in the sixteenth and seven- 
teenth century. Details are added on the following levels: the focus gradually moves 
to more localized depictions of China's heavy frosts and the Little Ice Age in North- 
ern Europe to the winter landscapes by Pieter Bruegel the Elder in the Low Coun- 
tries and Vermeer's painting View from Delft. The painting is explained in more 
detail as containing several *doors" into the world of the seventeenth century. One 
door is the herring boats captured in the picture, as evidence of the herring fishery 
moving south under the control of Dutch fishermen due to climate change. Another 
door is the home of the Dutch East India Company, the VOC, also visible in the pic- 
ture, which points to the network of trade that linked the Netherlands to Asia from 
the late sixteenth to the late eighteenth century. 

Figure 3 (right) illustrates another hypothesis following the storyline more 
closely. More precisely, the text can be restructured starting from the other direc- 
tion, i.e. the *doors" which are the paintings themselves corresponding to each 
chapter, then zoom in to open those doors and gradually expand the text. For in- 
stance, the chapter one z-text unfolds from the artwork and its description through 
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events of local history to the large-scale view on global cooling and world trade de- 
velopment. The labels marked in grey in the figures indicate bibliographical notes 
that contain pages in the book from where the fragments were extracted, such as 
pages 11, 12, 13 and Plate 1 (for the painting). The restructuring of the original text 
as a z-text presupposes that various clusters of meaning, belonging to different lev- 
els of detail, are scattered throughout the chapters of the book, not necessarily in 
contiguous areas. My assumption is that grouping them together by level on the 
same plane and stratifying the representation on several levels would provide new 
insights into the text and its complex multi-layered structure. This implies various 
conceptual scales dispersed through fragments that link to each other horizontally 
and vertically over shorter and longer distances. The methodology applied to detect 
this type of structure has followed this intuition. 


3 Topic modelling 


For the preparation and first phase of analysis of the corpus, I used MALLET 
(McCallum 2002), a software package applying latent Dirichlet allocation (LDA) 
for topic modelling (Blei 2011), combined with Microsoft Excel functions and Vi- 
sual Basic for Applications (VBA) procedures that I created for the project? The 
choice of MALLET and Excel was driven by their accessibility and the possibility 
of creating output and diagnosis files for further analysis and processing. How- 
ever, the methodology should be applicable to other types of software as well (for 
instance, to an integrated Java package that may implement the proof of concept 
described in this chapter in a second phase of the project). 


3.1 Entropy 


Each book from the dataset was analyzed with MALLET.® Each folder, corre- 
sponding to a book organized into files for chapters or parts, was imported via 
import-dir with the options keep-sequence and remove-stopwords (strings were 
converted to lower case by default). The topic models were built with train-topics 


5 Experiments with hierarchical LDA (hLDA) (Blei et al. 2009) were ongoing at the time of writ- 
ing and are not described in this paper. 

6 For more details about the options used for analysis, see Graham et al. (2012) and the online 
MALLET documentation at https://mimno.github.io/Mallet/topics and https://mallet.cs.umass.edu/ 
diagnostics.php, accessed July 27, 2023. 
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including the options output-state, output-topic-keys and diagnostics-file to pro- 
duce a series of XML and tab/space delimited files. The resulting data were im- 
ported into Microsoft Excel for processing through built-in functions and VBA 
procedures that I wrote for this purpose. After a set of tests with various numbers 
of topics (8, 10, 15, 20, 25) and analysis of topic quality, the number of topics was 
empirically set at 20, with an optimize-interval value of 20 and the default value 
of 20 for num-top-words. The decision was based on the observation that the num- 
ber of 20 topics produced a topic distribution that included at least two dominant 
topics appearing in almost all the chapters/parts of the books from the collection 
considered in the study. This observation was considered as a first indicator of a 
structure layered from general to specific. 


1 2 3 E 5 6 7 8 
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Figure 4: Topics sorted by document entropy in Brook (2009), with topic probabilities (vertical axis) 
and their distribution by chapter (1-8, horizontal axis). 


The goal of post-processing the MALLET files was to devise a methodology for de- 
tecting the levels of generality and specificity that characterize each book. For 
this purpose, I used the topic distribution per document (chapter or part) from 
the composition file, combined with the document entropy measure from the di- 
agnostics file. Topics with low entropy values are concentrated in a few docu- 
ments, while topics with higher entropy values are spread evenly over many 
documents (MALLET documentation — Topic model diagnostics). This metric was 
considered as an indicator of generality versus specificity within the chapters/ 
parts of the books included for analysis. 
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Figure 4 shows the topic distribution per chapter for Brook (2009), with topics 
sorted in descending order of their document_entropy. One can observe that 
topics T12 and T11 (top of the bars), and to a lesser degree T19 and T9, are the 
ones spread throughout the chapters, while topics such as T18, T15 and T13, at the 
other end of the spectrum (bottom of the bars), are mostly concentrated in a sin- 
gle chapter (chapters 1, 2 and 3 respectively). Intermediate topics are represented 
by thinner strips in the middle area of the bars. 

Figure 5 presents the topic distribution by chapter (Brook 2009) for each of 
the 20 topics, arranged from more general to more specific (left to right and top- 
down). Table 1 shows excerpts of top words for the most generic and most specific 
topics and the chapters where these topics are prominent. 

The first two topics (T12, T11) are almost evenly distributed throughout the 
chapters of the book. They are part of Brook's recurrent argumentation that out- 
lines the emergence of global trade in the seventeenth century, connecting Eu- 
rope with the world. Narrower descriptions of particular events, developed in 
relation with the eight paintings by Vermeer and other artworks, stand for articu- 
lation points chosen by the author as *doors" or *passageways" to the seven- 
teenth-century world for each chapter (e.g. T18, T15, T13). Intermediary topics (e.g. 
T17) that cover fewer chapters (but more than one) appear to be less coherent" 
and are probably referring in the texts to fragments that make the transition be- 
tween more general and more specific themes. 


3.2 Levels 


To detect the number of levels of generality and specificity inherent to a text and 
assign topics to such levels, I used the document entropy metric and the computa- 
tion of the slope (Excel built-in function) as a measure of the generality/specificity 
variation from one topic to another in the graph (Figure 6). 

I considered that two adjacent topics T;, T; belong to the same level if the abso- 
lute value of the slope computed using their corresponding document entropy is 
less than the value of the average interval computed as (max - min)aocument entropy 
divided by the number of topics.? The resulting mapping of topics to levels is 
shown in Table 2. One can observe that same level topics tend to appear together 
on plateau, while a change of level is marked by steeper or longer slope lines in the 


7 Also according to the MALLET coherence indicator computed for the topics. 
8 Except for the two most general topics that were considered by default as belonging to two 
separate levels, 1 and 2. 
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Figure 5: Topic (T0-T19) distribution by chapter (1-8)? in Brook (2009). 
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Table 1: Most generic and most specific topics. Excerpts from Brook (2009). 


Topic Top 20 words Distribution by chapter 


T12 century world dutch time vermeer seventeenth painting years Even 
people delft place trade life voc men things long great sea side 


T11 chinese china back made european trade europe europeans Even 
south coast end make called ship people spanish found needed 
portuguese japan 

T17 gunpowder fashion bound kill feet boundaries fishing passage Mostly in 2 VermeersHat, 
role hung dutchmen rebel lack supplies semedo scattered 4 GeographyLessons and 
correctly infiltrating reaching europe's 7 Journeys 

T18 delft paintings shanghai view canal pearl rotterdam built Mostly in 
schouten chamber schiedam web buildings oude surface cold 1 TheViewFromDelft 


herring kolk contacts dong 


T15 champlain french lake beaver native arquebus champlain's huron Mostly in 2 VermeersHat 
mohawks hurons allies felt hat montagnais lawrence hats dream 
iroquois chiefs map 


T13 porcelain white objects ships dishes lion dutch wen potters pieces Mostly in 3 ADishOfFruit 
amsterdam taste voc cargo dish portuguese produced ceramic 
lam blue 


diagram. The topic levels also seem to be correlated with the degree of generality/ 
specificity or distribution by chapter shown in Figure 5. 

Once the topics were assigned to levels, these levels were propagated to all 
the words in the text belonging to the topics. For this purpose, I used Excel to fur- 
ther process the MALLET output obtained via output-state, which is a file contain- 
ing the words of the corpus (book), after stopword removal, with their topic 
assignments, index and position within each document (chapter or part of the 
book). In this way, each word was assigned to the level of the corresponding 
topic. Since the analysis of text as a scalable structure was intended for units of 
text larger than words, I created a set of procedures in Excel VBA to propagate 
levels from words to segments of a given length in number of MALLET words.'? 


10 That is, the number of word tokens after stopword removal by MALLET (this is how words 
are also referred to for the rest of the chapter). 


298 —— Florentina Armaselu 


NN 


PRRRP RRP RPRR 
(O i= i> N (Q P>. (n O +. DVD N P> (S 


T12T11T19 T9 T3 TO T17 T5 T4 T10 T8 T1 T7 T2 T18 T6 T13T14T15T16 


Figure 6: Topics (T0-T19) in decreasing order of their document entropy (vertical axis) in Brook 
(2009). 


Table 2: Topic to level mapping, from generic 
to specific (top-down), in Brook (2009). 


Topic Level 


T12 

T11 

T19 

T9, T3, TO, T17, T5, T4, T10 
T8 

T1, T7, T2, T18 

T6, T13, T14, T15, T16 


NOW P Q N = 


First, the probability of each word to belong to a topic was computed by counting 
the number of times a word w was assigned to topic t and dividing this value by 
the total number of words assigned to that topic. Then, given the length of a seg- 
ment s defined as a number of words inside a document, a score was computed 
for each level according to the following formula: 


countlvljins — avg prob word topiciins 
x 


score(s, l) = : : - 
seg_sizes avg_word_distancejins 


(1) 
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where: count_\vlj;ns is the number of times level l appears in segment s (i.e. the 
number of words assigned to level l in segment s); seg size, is the size in number 
of words of segment s; avg prob word topiciins is the average word-topic proba- 
bility for words belonging to l in s; avg word distance, is the average distance 
between words belonging to lin s. 

A segment containing words from different levels is therefore assigned to the 
level that has the highest score according to (1). This is the level that appears 
many times in the segment and has many words assigned to it, whose average 
probability of words belonging to that level is higher, and which involves words 
that appear grouped together at smaller distances (presumed to form more com- 
pact clusters of meaning). In the case of Brook (2009), seven levels were detected 
(Table 2) and propagated to segments by applying this method. 

Tests were run for different segment sizes, from one segment of the size of each 
book, then sizes iteratively divided by 4 up to 1,024 (six iterations), in a process sim- 
ilar to the generation of the Koch curve that divides the segments by 4 at every itera- 
tion. Segment counting was reset at the beginning of each unit (chapter or part), 
except for the first iteration when a single segment of the length of the book was 
considered. Thus, segments of different sizes could result from an iteration (either 
for values larger than a unit size, when the actual size of the segment was the unit 
size, or for segments placed at the end of the unit containing the remaining words 
after the division corresponding to the iteration). The process was intended to simu- 
late, by iterative reductions of the segment size, the representation of text at various 
scales, revealing a stratification by levels and a fragmented rather than flat structure 
where all the components are placed on a single line. The segment-level diagrams in 
Figure 7 were computed in Excel following the method for step charts without risers 
(Peltier 2008). It was observed that for large segment size values (large scale), when 
one segment covers a full unit (chapter or part), the assigned level can differ from 
unit to unit, and it is not always a level corresponding to the most general topics, as 
would have been expected. Sometimes it may be a specific level or, less often, an 
intermediate level. Figure 8 displays the detail of the word distribution by level for 
the first 11 words and the first segment of size 35, 140 and 560 words in Brook's book, 
chapter 1. We can compare it with the three bottom diagrams from Figure 7 (read 
from right to left). According to the score computed by formula (1), segment 1 is as- 
signed to level 6 when considered at a small scale (segment size: 35 words) and to 


11 With the ratio of 22, wherek=1,2,...,6 represents the number of the iteration. 

12 For simplification, all the segments are represented equally. Segment i spans i to i+1 (starting 
with 1), where i stands for the numerical labels on the horizontal axis of segments. For visibility 
and analogy purposes, the segments were represented as 15pt-wide bars (instead of points) in the 
Excel diagrams. The vertical axis represents the levels, from 1 to 7 for Brook's book. 
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level 1 when the scale increases further (140, 560 words). For larger scales (Figure 7, 
top, right to left), the first segment remains at level 1 for the next two iterations, but 
is assigned to level 7 when a single segment of the size of the whole book is consid- 
ered. This way of looking at the text as made of building blocks of increasing size as 
the observation scale increases can provide insights into the mechanisms of meaning 
production which involve assembling words with different degrees of generality and 
specificity to form more complex units. The specificity or generality of these higher 
order units, such as sentences, groups of sentences or paragraphs, chapters, parts 
and whole book, could therefore be detected and mapped on different levels, reveal- 
ing a stratified conceptual structure rather than a linear layout. 


Brook (2009). Segments by level, z = 35,893 Brook (2009). Segments by level, € = 8,973 

o 1 2 3 4 s 6 7 8 9 o 1 2 3 4 5 6 7 8 9 
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Brook (2009). Segments by level, € = 2,243 Brook (2009). Segments by level, € = 560 
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Figure 7: Segment distribution (horizontal axis) by level (vertical axis) at different scales, with £ the 
size of the segment in number of words (Brook 2009). 


If we read Figure 7 in reverse order, we can interpret the progression left to right 
and top-down by analogy with a process in physics. First, the whole text-bar is 
assigned by the algorithm to the most specific level 7. By exposure to external fac- 
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tors (in our case the analysis at different scales)! the text is broken into smaller 


and smaller units of analysis, which seems to increase the mobility of the result- 
ing segments and their migration to more generic levels. 

From a conceptual point of view, it therefore appears that the focus of Brook’s 
book gradually moves from specific to generic (or from analysis to synthesis) with 
the decrease in size of the investigation unit. Following this line of thought, which 
seems to align with the storyline z-text layout depicted in Figure 3 (right), we may 
infer that the strategy of argument unfolding of the book is primarily articulated, 
when considered at a large scale, around the concept of “doors”, symbolized by the 
eight artworks corresponding to each chapter. This conceptual framework, corre- 
sponding to the microhistory level, is gradually enriched and contextualized within 
a global-history perspective that depicts the “dawn of the global world in the seven- 
teenth century" through storytelling and synthesis inserts that become more “visi- 
ble" or prominent at intermediary and smaller scales of analysis. In his study of 
semantic similarity in a taxonomy, Resnik (1995) proposes a measure to quantify 
the *information content" of a concept according to its position in a taxonomy (e.g. 
the top concepts being more abstract). Starting from this idea but applying it to tex- 
tual fragments, I will use the term informational granularity map for the diagram 
depicted in Figure 7 that graphically describes the degrees of generality and speci- 
ficity characterizing the units of analysis at different scales of representation. Gran- 
ularity is understood both as a measure of the scale of observation, represented 
through the segment size, and as an expression of the degrees of generality and 
specificity of the content itself. Section 5 will provide a comparison with the infor- 
mational granularity maps built for the other books in the collection (Figure 15). 


4 Fractal geometry 


This stratification by levels suggested that texts might possibly be interpreted as 
fractals; that is, as irregular, non-linear forms characterized by a certain degree 
of self-similarity. In this view, a text, first considered in its entirety as a single 
segment of words and a single conceptual unit, becomes scattered along several 
conceptual lines when iteratively broken into smaller segments and analyzed at 
decreasing scales. The question was whether this type of structure can be more 


13 In physics this may correspond to exposure to radiation or higher temperatures that produ- 
ces segment fragmentation. 

14 It should be noted, however, that the z-text model operates with segments of the order of 
several sentences or one or two paragraphs. 
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formally portrayed as a fractal and if it is possible to determine its degree of ir- 
regularity as measured through a fractal dimension. This measure may then be 
used to compare the fragmentation of different texts by level and as a possible 
indicator of their complexity in terms of multi-layered rather than linear struc- 
tures. To do this I applied a method called box counting that is used in mathemat- 
ics, physics and other natural sciences to detect the fractal dimension of irregular 
shapes (Falconer 2014; Gouyet 1996). 


4.1 Fragmentation 


I considered text representations for the corpus like the one shown in Figure 7. A 
grid of squares of side £ can be imagined as covering the image at every scale, 
where e takes iteratively different values for each of the analyzed books.” The 
algorithm consists in counting the number N(e) of squares (or boxes) of side £ 
that intersect the text shape, for grids of different granularity. 

Figure 9 shows an example of box counting for Brook's book and a box of 
side £ = 8,973 words. The number of boxes N(e) in this case is six, that is the num- 
ber of squares in the grid that contain segments of text. At this coarser granular- 
ity, only three of the seven levels detected for the book are occupied with 
segments, the most general level (L1) and two of the most specific ones (L6, L7). 
The segments and boxes were modelled in Excel as integer intervals taking ac- 
count of the number of words in the segments and their succession, and the (i, j) 
pairs, where i represented the column and j the row corresponding to a box. The 
total number of boxes in a grid at a certain scale was defined based on the scale 
factor s = 22? plus 1, where k was the number of the iteration. For instance, at 
the iteration k - 2, the total number of words for Brook's book (35,893) was di- 
vided by the scale factor s = 2? = 4, resulting in segments of e = 8,973 words.!? 
Since the division might not always be exact, the scale factor was increased by 1 
to include the remaining words of the last unit. Thus, a grid of (s+1)*(s+1) = 5*5 = 
25 boxes was devised. The position of the levels and the level interval w on the 
vertical axis were determined by dividing the maximum value of the squared 
grid by the number of levels Nj. For the Brook example in Figure 9, the level inter- 
val w = [(s*1)*e]/N; = (5*8,973)/7 = 44,865/7 = 6,409. 


15 E.g. 35,893; 8,973; 2,243; 560 . . . words for Brooks book corresponding to the six iterations, 
Section 32. 

16 At this scale, £ exceeded the size of each unit (chapter) and the actual segment sizes corre- 
sponded to the sizes of the eight chapters of the book, labelled S, — Sg in the figure. 
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Box counting, segments by level, z = 8,973, w = 6,409 (Brook, 2009) 


0 8,973 17,946 26,919 35,892 44,865 
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Figure 9: Example of box counting, second iteration, k = 2 (Brook, 2009). 


The process was repeated and the number of boxes intersecting the text shape 
was calculated for six iterations (k = 1-6). I then built the diagram for log(1/e) and 
log(N(e)) (Figure 10). Usually, the fractal literature considers that if this curve exhib- 
its an approximately linear behavior (at least for a certain subset of the plane), 
then N(e) obeys a power law of the form: N(e) = c*e P, where c is a constant. There- 
fore, the studied object may be assumed to possess fractal properties in the linear 
region, and D represents its fractal dimension." Intuitively, D reflects how the num- 
ber of counted boxes grows with the decrease in box side, the way in which the 
analyzed object fills the space, the ratio of change in detail to change in scale, or 
the inherent complexity of an irregular form (Falconer 2014: 27-28; Karperien and 


17 Or box-counting dimension. There are different types of fractal dimension. See Mandelbrot 
(1983) and Falconer (2014) for a survey of these types and their degree of equivalence. 
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Jelinek 2016: 20). Various applications of the box-counting method to images either 
considered a grid of iteratively reduced size overlaid on the same image (Ostwald 
and Vaughan 2016) or used mathematical functions instead of pixel pictures to 
eliminate the distortions due to zoom-in (Wu et al. 2020). Unlike the image-based 
approach, I considered that the representation of text at each scale also changes 
with the side of the grid cell e (according to the algorithm described in 3.2) and I 
worked with an interval-based modelling of the boxes and segments in Excel. This 
type of representation was intended to capture the disposition of segments on lev- 
els at various scales by simulating the effect of a zoom-in that makes more and 
more details visible as the scale decreases, and to test the application of the power 
law for fractal behavior on these different scale-driven configurations. 

In practice, D is calculated as the slope of the linear region of the graph (Figure 10) 
for a certain number of iterations. Studies in a variety of research fields have 
shown that despite its relative simplicity, the box-counting method presents a series 
of drawbacks. For instance, the value of D varies with the range of box sides, the 
number of iterations? and certain characteristics of the grid or image to be ana- 
lyzed (positioning, resolution) (Datseris et al. 2021; Ostwald and Vaughan 2016; Har- 
rar and Hamami 2007; Klinkenberg 1994). In their analysis of fractal patterns of 
words in a text, Najafi and Darooneh (2015) observe that detecting the fitting range 
in the log-log plot of the number of filled boxes against box side, and the fractal 
dimension as the slope of the line of best fit is quite challenging to do automatically. 
Other studies have pointed to the need to provide other statistical measures, such 
as the correlation coefficient, mean and standard deviation over multiple samples 
used to compute the fractal dimension, to assess the accuracy and limitations of the 
model (Karperien and Jelinek 2016: 23-26). 

To estimate the fractal dimension, I applied the method of the least squares to 
compute the slope of the log(1/e) vs log(N(e)) graph (Harrar and Hamami 2007) and 
computed related statistical measures as first accuracy estimators. Figure 10 displays 
a 1.0233 slope and 4.7572 intercept (left) and the variation of the number of filled 
boxes with box side (right). The R? statistic shows a proportion of 0.9969 of variabil- 
ity in Y = log(N(e)) that can be explained using X = log(1/e) and a measure of their 
linear relationship and correlation in the sample. A value close to 1 indicates that a 
large proportion of the response has been explained by the regression, while a num- 
ber near 0 suggests the opposite (James et al. 2017: 69—71). The FDIST statistic esti- 
mates the probability that the observed relationship between the two variables 


18 For instance, Ostwald and Vaughan (2016: 40) recommend *at least eight and preferably ten 
or more comparisons" for better accuracy, to reduce the error rate to “around +1 96 or less", in 
their study of the fractal dimension in architecture. 
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Brook (2009). log(1/s) vs log(N(e)) 
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Figure 10: Fractal dimension as the slope of the double-log plot, log (1/e) (horizontal) vs log(N(&)) 
(vertical) (top); box side (horizontal) vs number of filled boxes (vertical) (bottom) for iterations k = 1 
to 6 (Brook 2009). 
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occurs by chance, which is very low (3.55009E-06) in this case. Although the influ- 
ence of some factors on the box-counting dimension, such as the side of the box and 
the number of iterations, requires further analysis, I considered the values obtained 
through this approach as a rough approximation of the fractal dimension and basis 
of comparison for the texts included in the study (see also Table 4, Section 5.2). 

Once the dimension value had been computed, another question needed to 
be addressed, i.e. how this dimension could be interpreted within the fractal the- 
ory framework. Mandelbrot (1983) generically called dust the fractals with dimen- 
sions in the interval 0 to 1. A classic example from this category is Cantor dust, 
whose generation is illustrated in Figure 11. Its construction starts with a straight 
line as an initiator, followed by a generator obtained by removing the middle 
third from the initiator. The process is repeated at smaller and smaller scales, by 
continuing to delete the middle third. 


a 
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Figure 11: Cantor dust (adaptation of Mandelbrot 1983, 80): a - first iteration, initiator; b - second 
iteration, generator; c, d, e - third, fourth and fifth iteration. 


While the Koch curve (Figure 1) is a fractal with an approximate dimension of 
1.2618, Cantor dust has a fractal dimension of 0.6309 (Mandelbrot 1983: 36, 80). Two- 
or three- dimensional Cantor dusts or sets can also be generated, with fractal dimen- 
sions above 1, such as 1.2619 and 1.8928 respectively (Stawomirski 2013; Tolle et al. 
2003). One can observe a certain similarity between Cantor dust (Figure 11) and the 
segments of Brook's book represented at smaller and smaller scales (Figure 7). Some 
differences may be noted as well. First, the appearance of the latter is not as regular 
as that of the former since the textual segments and their distribution were gener- 
ated through a score-driven procedure with variable results rather than the applica- 
tion of an invariant pattern of iteration. Second, Cantor dust corresponds to a linear 
structure with the matter from the gaps being gradually incorporated into the dark 
areas of increasing density through the process of “curdling”, according to Mandel- 
brot's terminology. In contrast, Figure 7 depicts a multilinear, two-dimensional ar- 
rangement derived from the dispersion of the textual matter on the plane over 
several levels of generality and specificity with the decrease in scale. 


19 See also *LINEST function", Excel for Microsoft 365, https://support.microsoft.com/en-us/of 
fice/linest-function-84d7d0d9-6e50-4101-977a-fa7abf772b6d, accessed July 27, 2023. 
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I would therefore argue that texts may exhibit fractal geometry, which re- 
veals a stratified conceptual structure on layers of generality and specificity at 
various scales. We can recall the procedure described in Section 3.2 and formula 
(1) that assigns a segment to the level with the highest score. This score considers 
the word count and the probability of words belonging to that level, as well as 
their average distance. If we imagine the segments as a form of dust, they seem to 
be attracted, at different scales, to levels that correspond to various degrees of 
generality and specificity. This “force of attraction” may be determined by the ex- 
istence of more compact and coherent clusters of meaning in the text, which are 
more likely to belong to a certain level as compared with the other levels. 

Following this interpretation, a closer look at Figure 7 suggests that at the 
largest scale, when the segment size coincides with the length of the whole book 
in number of words, this segment is “attracted” to the most specific level 7. It 
therefore seems that clusters containing words that are more specific are preva- 
lent in conceptually depicting Brook’s book at a global scale. However, for smaller 
and smaller scales, the dust segments are gradually scattered and attracted to- 
wards more generic, surface levels, as shown by the following iterations in the 
figure. This raises the question of whether this attraction and the movement of 
“dust” fragments from one level to another as the scale changes may indicate the 
existence of a certain type of pattern, correlation or persistence over a longer 
range, or whether it is completely random. 


4.2 Memory 


To determine whether such a “memory” of the text exists, Iperformed an analysis 
of fluctuation and long-range correlation for the selected corpus. The method was 
proposed by Peng et al. (1992) to uncover the correlation between basic structural 
units of nucleic acids over long distances by mapping nucleotide sequences onto 
a so-called “DNA-walk”. In this random walk model, a function u(i) describes a 
walker's move through two values, u(i) = +1 and u(i) = -1, if the walker moves 
either up or down for each step i of the walk. Based on the *net displacement", 
the sum of the unit steps u(i) after 1 steps, Peng et al. (1992: 168) compute a mea- 
sure called “root mean square fluctuation", F(), that characterizes the average of 
the displacement. A power law function of the form PO) ~ 1° where a # may indi- 
cate a long-range correlation in the considered walk, while a value of a - 1 can be 
the indicator of a random walk. A straight line on the plot log(l) vs log(F()) would 
confirm a power law between the two measures, with a representing the slope 
calculated through the method of the least squares. Studies in linguistics, such as 
those by Pavlov et al. (2001), have applied the method to investigate long-range 
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correlations between letters and combinations of symbols in English novels. In 
her analysis of children's language, Tanaka-Ishii (2018: 8, 5) observed that “long- 
range correlation is due to the arrangement of frequent words and rare words" 
and that “rare words tend to cluster". 

The aim of my analysis was to determine whether any long-range correlation 
can be observed in the “walk” of the word segments from one level to another at 
different scales of representation. I considered the iterations k = 2 to 6. The func- 
tion u(i) modelled the walk through the values: *1, if a segment j was followed in 
the text by a segment j*1 placed on an upper level (move up); -1 for a move down 
to a lower level; and 0 for a segment j*1 remaining on the same level as segment j. 
The number of steps was 1 = Ns-1, where Ns represented the number of segments 
detected for each iteration k. The fluctuation PO) was computed in Excel using the 
formula proposed by Peng et al. (1992: 168). Figure 12 shows the plots for log(l) vs log 
(FQ) for the iterations 4-6 (left to right) and Brook's book. 

For larger scales (k = 2 to 3), the linear regions of the curves log(l) vs log(Fd)) 
covered shorter portions of the graph. As illustrated in Figure 12, for decreasing 
scales (k = 4 to 6), the diagram started to exhibit more visible linear regions, espe- 
cially in the first part of the curves (left). A selection of the linear portions (right) 
allowed the values of a to be computed as the slope of the plots for these zones. As 
£ (the size of the segment) decreased, a took values from 0.1917 to 0.4218 (Appendix, 
Table 6, Brook). What can therefore be inferred about the segment *walk" through 
levels and the degree of correlation over longer distances corresponding to this 
walk? The values of a below 0.5 seem to suggest an “anti-persistent” behavior or 
*mean reversion" (Ijasan et al. 2017; Saha et al. 2020; Hu et al. 2021) when a move in 
one direction is followed by a move in the opposite direction. This behavior refers 
to the linear regions of the graphs (Figure 12, right), i.e. to a number of steps | = 9, 
30 and 509 and segment size £ = 560, 140 and 35. In the case of Brook (2009), these 
segment sizes roughly correspond to blocks of 7-8 paragraphs, 221 paragraphs and 
2 and 1-3 sentences respectively. As the segment size decreases, the range in num- 
ber of steps with walk correlation apparently increases. Shorter blocks at smaller 
scales would therefore exhibit longer memory. Table 6 (Appendix) summarizes the 
results of the experiments calculating the long-range correlation in the walk for the 
original and shuffled data.” One can observe that for this book, the segments of 
140 words (~ 2-2} paragraphs) display a smaller value of a than the segments of 560 
and 35 words, which seems to be related to their relative immobility (a higher per- 


20 The levels assigned to segments at each scale were randomly shuffled. The procedure was 
then applied in the same way as for the original data to model the walk and to compute the fluc- 
tuation and the value of slope a in the linear region. 


310 —— Florentina Armaselu 


Brook (2009). log(l) vs log(F(I)), £ = 560 Brook (2009). log(!) vs log(F(I)), e = 560, I = 2-9 


Brook (2009). log(l) vs log(F(I)), e = 140 Brook (2009). log(!) vs log(F(I)), € = 140, | = 2-30 


Brook (2009). log(l) vs log(F(I)), £ = 35 Brook (2009). log(l) vs log(F(I)), € = 35, | = 2-509 


Figure 12: Fluctuation in the “walk” of word segments through levels for iterations k = 4 to 6 
(Brook 2009). 


centage of no moves, when the segments remain on the same level for longer peri- 
ods) as compared with the other two block types. It is not yet clear why this hap- 
pens, but a possible explanation may be related to the inner configuration of 
meaning of the book and the way this configuration is modelled by the algorithm 
defined by formula (1). In other words, 140-word blocks may be more dependent 
on the surrounding segments and thus less susceptible to independent movement 
across levels than the two other block types. Although there are significant differ- 
ences between the values of a for initial and shuffled data, the latter still displayed 
values of a « 0.5, thus indicating an anti-persistent behavior, while a more random 
walk would have been expected. More shuffling rounds would be needed to draw a 
conclusion, but it is possible that the relatively high proportion of no moves (6096— 
6896) that characterized the walk at various scales had an impact on the results of 
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the shuffling process. Therefore, the segments seem to fluctuate or remain still 
around or on some dominant levels at any examined scale. Mainly similar behavior 
was observed for the other books in the collection. Section 5 will provide a discus- 
sion of this aspect and its possible connections with Zipf's (2012) notion of specific- 
generic balance in texts. 


4.3 Lacunarity 


Another concept that presented interest for this study was that of lacunarity, as a 
“measure of the ‘gappiness’ or ‘hole-iness’ of a geometric structure" (Plotnick 
1993: 202). In the domain of fractal geometry, this may be applied, for example, to 
distinguish objects with close or identical fractal dimensions, which display dif- 
ferences in the distribution and size of the gaps. The “gliding box algorithm" is 
one of the methods often applied to measure lacunarity (Allain and Cloitre 1991; 
Plotnick et al. 1993; Da Silva 2008). The method involves firstly representing an 
object against a grid of squares like the one shown in Figure 9. Then a box of vari- 
able side length (e.g. g, 2*e, 4*e, 8*e, etc.) is placed on the upper left corner of the 
grid and the number of occupied squares of side £ within the box is counted. 
After moving the box one column to the right, the filled squares in the box are 
counted again. The process is repeated over all the rows and columns of the grid, 
and for different sizes of the gliding box. The lacunarity is defined as a function 
of the side of the gliding box and the number of squares occupied by the object at 
different scales. 

While the fractal dimension measures *how much the object (or data) fills the 
space", the *amount of space-filled, or the mass in some sense", lacunarity meas- 
ures *how the data fill the space", the *spatial size of gaps and their structure 
within a set" or the *mass distribution" (Di Ieva 2016: 10; Tolle et al. 2003: 131). 
Lacunarity may thus indicate the “level of contagion between occupied sites at a 
particular scale" or the *degree of spatial clumping or aggregation" of certain 
populations (Plotnick 1993: 208). I considered that such a measure may be useful 
for a closer analysis of the distribution of gaps and the movement of segments to 
one level or another with the change in scale. 

Figure 13 and Table 7 (Appendix) present the values and shapes obtained for 
the measure of lacunarity in the books in the collection. To compute this measure, 
I used the “gliding box algorithm" (Plotnick et al. 1993). For each scale and itera- 
tion (k = 1-6) corresponding to a certain segment and grid cell side (e), the lacu- 
narity was calculated using different values for the side of the gliding box, i.e. 
2°, 21, 22 ... as multiples of e until a certain threshold for each iteration was 
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Brook (2009). log(r) vs log(A), £ = 35,893 Brook (2009). log(r) vs log(A), £ = 8,973 


Brook (2009). log(r) vs log(A), £ = 2,243 Brook (2009). log(r) vs log(À), € = 560 


Brook (2009). log(r) vs log(A), £ = 140 Brook (2009). log(r) vs log(A), £ = 35 


Figure 13: Log lacunarity (A) by log side of the gliding box (r), iterations 1-6 (Brook 2009). 


attained. The calculations also included the case of r corresponding to the side of 
the grid, M, as the maximum value (r = 271,1 < i < k+2;r=M,i > k«2)?! 

As noted by Plotnick et al. (1993), the highest values of lacunarity were re- 
corded at each scale (k = 1-6) for values of r = 1 when the gliding box was equal 
in size with the cell of the grid of side ¢, while lower lacunarities were obtained 
as the box side increased. At different scales, the curves exhibited a certain de- 
gree of linearity (Figure 13), which according to Allain and Cloitre (1991) is a fea- 
ture of self-similar fractals. In general, configurations with low variation in gap 
sizes display lower lacunarity, while objects with a wide range of gap sizes or 
larger areas of clumped sequences show higher values of lacunarity. For Brook 


21 Table 7 shows the specific values of r applied for each iteration. 
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(2009) the average lacunarity (4) increased with the decrease in scale and seg- 
ment size (e) and in the fraction represented by the occupied cells in the grid (P) 
(Table 7, Figures 7 and 9). This suggests that the variability of gap sizes increased 
with the decrease in segment size corresponding to the increase in segment 
movability from one level to the other. A comparison of the lacunarity measure 
of all the books selected for the study will be presented in the following section. 


5 Discussion 


The experiments performed so far with the corpus of seven books, five from his- 
toriography, one from literature and one from philosophy, indicate that an analy- 
sis at various scales of the texts may reveal fractal properties and a stratified 
structure.” To detect these levels, I combined the concept of zoomable text (z-text), 
as a starting point, with topic modelling and elements from fractal theory and appli- 
cations. Although some limitations were identified and further analysis is needed, it 
can be argued that the initial assumption of the text as a multi-layered conceptual 
construct seems to be confirmed. 


5.1 Attractors 


Various types of layers can characterize the internal organization of a text on lev- 
els, e.g. from simple to complex, abstract to concrete, global to local, etc. In the 
present study I modelled this type of layered pattern through the generic to spe- 
cific spectrum. To this end, I considered that topics spread over several docu- 
ments are more general than topics distributed mostly over a smaller number of 
documents or just one document. The existence of a specific-generic balance in 
semantic systems was formulated by Zipf (2012: 185) through the metaphor of the 
artisan involved in the task of classification by a number of n criteria and correla- 
tions, and the Principle of Least Effort. Specific correlations describe a small set 
of particular classes of events more completely, while generic correlations depict 


22 It should be noted that although theoretical fractals can go down indefinitely to smaller and 
smaller scales and we can imagine the scaling down of text to segments of size below 1 (word 
level), such as morphemes, letters, letter fragments, etc., for the present study I considered cut- 
offs in segment size above word level, as defined by the six iterations. See also Mandelbrot's 
(1983: 38) discussion on the Koch curve cascade of smaller and smaller promontories and the cut- 
off scales applied to real coastlines. 
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a larger set of classes but less completely. The balance between specific and ge- 
neric correlations would therefore be maintained by the artisan in his attempt to 
generalize upon the basis of specific correlations, and particularize upon the 
basis of generic correlations, with the aim of minimizing n and the classification 
effort. Approaching the same question but from a different perspective, Lafon 
(1981) proposed a probabilistic model to discern between basic (non-specific) and 
specific forms in a corpus divided into parts. 

My hypothesis was that this type of inference involves several degrees of gen- 
erality and specificity that can be examined through longitudinal and transversal 
cuts of texts into units (e.g. chapters, parts) and levels (Figure 2), and different 
scales of observation. I assumed that a gradual unfolding of generic and specific 
arguments can be observed in the global-micro history texts built upon thematic 
aspects that varied from broad worldviews to minute examination of distinctive 
historical events, people, objects or points in time. Texts from other domains such 
as literature and philosophy were also presumed to exhibit a gradual relationship 
between generic and specific elements, from words defining the general theme 
and basis of communication to localized forms characteristic of certain units 
only. The topic modelling approach used in level detection offered a first glimpse 
into this type of conceptual structure. Table 3 lists the number of levels detected 
for each text in the collection against the number of words and units for each 
book. The influence of these two factors considered in isolation is not clear. A 
closer look at the percentage of intermediary levels and topics and the distribu- 
tion of topics by level may suggest a possible explanation. 


Table 3: Number of detected levels, book length (MALLET words), analysis units (chapter or parts), 
and intermediary levels and topics (sorted by length). 


Book Length Units Detected Intermediary Intermediary 


levels levels (96) topics (%) 

1688. A Global History (Wills 2001) 51,846 8 8 62.50 35.00 
Gulliver's Travels (Swift 2009) 37,892 5 9 66.66 75.00 
The Inner Life of Empires (Rothschild 37,112 9 10 70.00 85.00 
2011) 

Vermeer's Hat (Brook 2009) 35,893 8 7 57.14 65.00 
Plumes (Stein 2008) 29,023 7 9 66.66 65.00 
Beyond Good and Evil (Nietzsche 2009) 24,196 10 7 57.14 45.00 
The Two Princes of Calabar (Sparks 16,504 7 6 50.00 40.00 


2004) 
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If we exclude the most generic (the first two) and the most specific (the lowest 
plateau) topics in the diagrams (Figure 14), we can infer which books exhibit a 
larger proportion of their topics and levels in the intermediary area. Thus, the 
books with a higher number of levels (> 8) in Table 3 are also those with a higher 
number of intermediary levels, such as Wills (2001), Swift (2009), Rothschild (2011) 
and Stein (2008). The number of detected levels may therefore be influenced by 
the length of the texts and the number of units, the way in which the authors 
shape their discourse through generic and specific classes of words and topics, 
and also classes of words and topics that belong to the area in between. 

Studying the texts at various scales revealed that some levels act as “attrac- 
tors” of segments (considered as “fractal dust”). The movement of segments from 
one level to another does not appear to be random. As shown in Section 4.2, this 
movement seems to be characterized by a particular type of “memory” of the seg- 
ments and values of a situated below 0.5, which would correspond to an “anti- 
persistent” behavior. Table 6 presents a summary of this type of memory. One 
can observe a certain symmetry in moves up and down and a relatively high per- 
centage of no moves or stationary behavior of the segments for all the books in 
the collection, which may indicate segment fluctuation around a dominant level 
at the smaller scales. Why this happens is not yet completely clear. As shown in 
Figures 7 and 15, the books as whole aggregates start on a more generic or specific 
level, and then, with the decrease in scale, a certain equilibrium between generic 
and specific tends to be established by the migration of segments from one level 
to another. The tension between generic and specific alluded to by Zipf (2012) 
therefore seems to operate at the smaller scales (and possibly word level) charac- 
terized by higher segment “mobility”. It should be noted, however, that the ge- 
neric-specific dichotomy is not binary, but multi-value and involves different 
degrees, or levels. 

Table 6 provides insights into segment mobility at smaller scales. The lowest 
values of a are displayed for Stein (2008) and Wills (2001), the former with a ge- 
neric level, the latter with a specific dominant level (Figure 15), and segment sizes 
of 453, 113 and 810 words.? These types of block therefore seem less mobile for 
these books, given the high percentage of no moves that characterizes them, or 
may follow a movement logic that is only feebly anti-persistent. For all the books, 
the memory interval (in number of steps) increases with the decrease in scale 
(segment size), and for more than half of the books (Rothschild, Stein, Wills and 


1.,1 1 
23 Corresponding to 55 +6 > 1-1 > and respectively 9--11 paragraphs. 
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Wills (2001). Topics by level Swift (2009). Topics by level 
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Sparks (2004). Topics by level 
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Figure 14: Topic distribution by level and book in the collection (sorted by length, left-right, top-down). 


Swift), a increases with the decrease in scale.”* These books, as can be observed 
in Figure 15, present a dominant level, with the highest density of segments, at all 


24 It would be interesting to investigate the value of a in the case of segments of 1 word in size, 
to see if it approaches 0.5, representing a random walk, or if it remains below this value, and 
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scales. In this case, anti-persistent behavior would consist in a tendency of most 
segments to remain on the dominant levels, except for those that have enough 
mobility (or energy) to escape their attraction. The increase in q seems to capture 
this phenomenon, since this form of energy appears to increase with the decrease 
in scale. Thus, Zipfs property would manifest itself not as a generic-specific bal- 
ance, but as a tendency to maintain stability around a certain level of generality 
or specificity. Two books (Brook and Sparks) exhibit a different pattern through a 
slightly lower value of q for the middle sizes (k = 5). In the case of Brook (2009), 
this may be related to a higher value for the number of no moves, as explained in 
Section 4.2. It should also be noted that this book contains the longest memory 
interval and the highest value of o for the smallest scale studied (k = 6). This may 
be due to a certain equilibrium between the occupation of generic, specific and 
intermediary levels (Figure 7, bottom, right) and the strategy, discernible at this 
scale, of an unfolding of detail vs global view, constructed around the eight art- 
works chosen “not just for what they show, but for the hints of broader historical 
forces that lurk in their details” (Brook 2009: 7). For Sparks (2004), the lower 
value of a (k = 5) may be related to the nature of the units themselves,” whose 
fluctuation suggests a slightly lower tendency to return to the dominant level at 
every move than the units corresponding to the previous iteration (k = 4).”° The 
case of Nietzsche (2009) is more intriguing, since it shows a decrease of a with the 
decrease in scale. This behavior could be caused by the sensitivity of smaller 
scales to the style of this philosophical text that alternates very long assertive sen- 
tences with very short questions, which may impose a logic of segment assign- 
ment to levels that is perhaps less anti-persistent in nature than when articulated 
at a larger scale. 


5.2 Dispersion 


The study of the books at different scales and their fractal geometry seem to offer 
a new standpoint on text as a conceptual object. This may reveal a certain type of 
dynamics in the stratification of levels and the way in which these levels attract 
word segments with changes in scale. Such behavior may be related to the aggre- 


thus continues to show an anti-persistent behaviour. The time allocated to the writing of this 
chapter allowed only for partial experiments of this type, meaning that it is not possible to draw 
a conclusion at this stage. 

25 3-1 paragraph. 

26 21-3 paragraphs. 
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gation of clusters of meaning, correlation over long distances and the modes in 
which conceptual building blocks of variable sizes are formed in language. 


Table 4: Fractal dimensions and generated statistics by book (sorted by dimension). 


Book Fractal dimension (D) R? FDIST 
Gulliver's Travels (Swift 2009) 1.032885269 0.997971565 1.544E-06 
Vermeer’s Hat (Brook 2009) 1.023331752  0.996924745 3.55009E-06 
The Two Princes of Calabar (Sparks 2004) 1.0148522 0.992653067 2.02913E-05 
Plumes (Stein 2008) 1.01428358 0.997639195 2.09167E-06 
1688. A Global History (Wills 2001) 1.00456998 0.997032804 . 3.30487E-06 
The Inner Life of Empires (Rothschild 2011) 0.99646761 0.988888016 4.64762E-05 


Beyond Good and Evil (Nietzsche 2009) 0.994695321 0.991815719 . 2.51873E-05 


All the books in the collection showed a fractal dimension (D) slightly below and 
above 1 (Table 4) and a resemblance with the category of dusts (Mandelbrot 1983). 
Fractal dimension is considered an indicator of the degree of *change in detail" or 
complexity that becomes apparent with the *change in scale" (Karperien and Jeli- 
nek 2016: 20) or a way to describe, together with lacunarity, the “visual look" of a 
dataset (Tolle et al. 2003: 129). The books with the highest values of D are Swift 
(2009), Brook (2009) and Sparks (2004), which show a higher degree of dispersion 
of segments with the decrease in scale as compared with the others (Figures 7 and 
15). While the books by Swift and Sparks exhibit lower values of average lacunarity 
(A) (Table 7), Brook’s displays a higher value that may be interpreted as a marker of 
higher variability in the size and structure of the gaps, and therefore a more com- 
plex pattern of detail unfolding with the decrease in scale. Sparks' and Stein's are 
very close in terms of fractal dimension but the texture of their segment distribu- 
tion differs by a higher average occupation fraction (P) and a lower average lacu- 
narity for the former and the reverse for the latter, which is also characterized by 
a simpler detail pattern with a concentration of mass on the second level and larger 
gaps. A simpler pattern, similar to Stein's, can also be observed for Wills', which 
has the highest value of average lacunarity discernible through large areas of 
empty space and a high density of segments on the last level at every scale. The 
values of D, À and P of the last two books in Table 4 are somewhat harder to inter- 
pret: first, because both exhibit a value of D that is less than 1 (although by a very 
small amount), which brings them closer to the category of one-dimensional dusts, 
despite a relatively higher average occupation fraction of the grid (P) as compared 
with the others. What distinguishes the two books is a higher average lacunarity 
for Nietzsche, and thus a higher variability of gap configurations, while for Roths- 
child the average À is the lowest of the whole collection, possibly due to a more 
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homogeneous size and structure of the gaps. With these observations summarized, 
the question that arises is how these measures relate to the initial assumption of 
generality and specificity levels characterizing these texts from a conceptual point 
of view. 

Differences were observed in the level of generality or specificity to which 
the segments corresponding to the largest scale were assigned in the first itera- 
tion (Table 5). This level may be interpreted as the initiator by analogy with the 
construction of theoretical fractals such as the Koch curve and Cantor dust (Fig- 
ures 1 and 11). It is from this initial level that the dispersion of segments towards 
other levels begins when the reduction in scale is applied through the 6 iterations. 
This may suggest that from a global perspective, the words belonging to the initia- 
tor tend to group together in more compact or coherent clusters than those at 
other levels. An additional hypothesis may consider these levels as potential start- 
ing points in the writing process by the real or a hypothetical author, or by an 
automatic process of text generation. 

As also illustrated in Figure 15 (read top-down), there are four books with an 
initiator corresponding to the more generic levels 1 and 2 (Rothschild, Swift, Stein 
and Nietzsche). The second row in the figure shows the distribution by level of seg- 
ments with a size usually comparable to that of a unit (chapter or part). Recall that 
generic and specific levels are based on topics with respectively larger and smaller 
values of document entropy, which means broader or narrower unit coverage. 
Rothschild’s initiator, placed on level 2, was assigned to a topic with large coverage 
and top words that refer to members of the Johnstone family (john william james 
george betty alexander) or to specific conditions, places and entities (slaves scotland 
grenada east india company). A closer look at the contexts of these words in the 
book revealed biographical details and fragments of letters and documents from 
various archives (via citations of primary and secondary sources), which can be 
associated with a microhistory perspective. For subsequent iterations, segments of 
smaller size spread either up, to level 1, corresponding to a topic with broad cover- 
age (johnstones empire information history slavery ideas) and thus a view appar- 
ently closer to a macro-history perspective; or down, to deeper levels (i.e. 4-7) that 
cumulated more localized topics in terms of unit coverage but were variable in 
terms of micro- vs macro-historical standpoints (henrietta illness anxious litigation; 
individuals historians enlightenment microhistory). Rothschild’s book therefore 
seems to be articulated around micro-historical characters and events, a conceptual 
unifying stratum discernible from a bird’s-eye view and more localized macro- 
historical arguments that become visible in the layered representation only with a 
decrease in the scale of analysis. Stein’s initiator was also placed on level 2, corre- 
sponding to a dominant topic (ostrich feathers trade industry plumes jewish). The 
decrease in scale resulted in the migration of segments either to the upper level 


Florentina Armaselu 


320 


`syooq XIS JO} Sə|e3S 1uƏ1əJJIp Je Jana] Aq uonnqıysıp 1uəu!Bəs SL əinBid4 


ma i» nm z oman ua 20) ponema (re amamma: ew nu ra * com aman (Ó 
' XT i 
mg mon = 
[ H ' 
" neu = 
' š Heec am ı um 
' TIT DLE 
minm vom TET ini NNI UTR INE UN 4 
mmm SEI IS. Dee TEEN war ig bh m nmm me om 
mem | Forum 
Ht 
= ss 
s " 


(6002) 9u»szieiN (6002) yıms (Tooz) SIIM (8007) uas (trooz) saupde (TTOZ) pl!'u3su1ou 


Text, Fractal Dust and Informational Granularity: A Study of Scale — 321 


Table 5: Initial level by book corresponding to the largest scale, iteration k = 1 
(sorted by number of levels). 


Book Initial level Total levels 


The Inner Life of Empires (Rothschild 2011) 
Gulliver’s Travels (Swift 2009) 

Plumes (Stein 2008) 

1688. A Global History (Wills 2001) 

Beyond Good and Evil (Nietzsche 2009) 
Vermeer’s Hat (Brook 2009) 

The Two Princes of Calabar (Sparks 2004) 


1 
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based on a general topic (jews modern global commerce history commodity) or to 
deeper levels (7, 9) corresponding to topics pertaining to manufacturers and the os- 
trich feather trade in different regions of the world, in Africa, America and Europe. 
Unlike Rothschild, Stein predominantly seemed to adopt a global history perspec- 
tive with synthesis explanations that accompanied the microhistory accounts, no- 
ticeable at smaller scales of the visual representation (which I will call map from 
now on). The initiators in the case of Swift and Nietzsche occupied level 1, with a 
very generic topic (great made time people good found hundred court) for the first 
and a similarly generic one for the second (man men good time great soul life taste 
world people love morality). With the decrease in scale, the fluctuations of segments 
revealed some of the intermediary and deepest levels of the maps. For Swift, the 
dispersed segments were placed mostly on levels 2, 4, 8 and 9, attached to topics 
that went from more general (country master reason nature honour), through me- 
dium generality referring to objects, situations or institutions common to different 
places visited by the protagonist (majesty majesty's left palace royal; yellow settled 
tolerable disposition), to more localized aspects specific to a certain country or 
event (yahoos houyhnhnms yahoo; emperor blefuscu imperial; island king luggnagg). 
For smaller scale iterations in Nietzsche's text, the intermediary level 4 and the spe- 
cific levels 6 and 7 were more prevalent. These levels referred to topics with me- 
dium coverage in the book (recognized process psychological; unconditionally utility 
experienced; love woman vanity) and to more localized, philosophical themes (judg- 
ments sensation faculty impulses; skepticism germany greatness scientific; morals 
morality herd gregarious). 

The other three books (Brook, Sparks, Wills) exhibited a different pattern 
containing an initiator on the deepest level and segment dispersion gradually in- 
volving the upper levels with the decrease in scale (Figures 7 and 15). For Brook, 
also discussed in Sections 3 and 4, the second iteration produced a movement up 
of the segments corresponding to chapters 1, 4 and 6, i.e. from level 7 to level 1 for 


322 —— Florentina Armaselu 


the first and level 6 for the two others. These chapters were the first to move 
since they were probably less stable with their most specific topics (T18, T7, T1) 
belonging to level 6 (Figures 4, 5, 14). The next iterations produced fluctuations of 
segments from these chapters between levels 6, 1 and 2, while the segments from 
the other chapters mainly fluctuated between levels 7, 1 and 2. Excerpts of top 
words from the most generic and specific topics of this book (Table 1) and the 
movements of segments (Figure 7) suggest a prevalence of the micro-historical 
perspective at larger scales, gradually balanced by global history arguments that 
become more discernible on the map with the decrease in segment size. Sparks’ 
case was more intriguing since it displayed an initiator placed on level 6, and in 
the second iteration, a movement up to levels 1, 2 and 3 of most of the chapter 
segments, except for chapter 3 (Figure 15). A possible explanation of the relative 
stability of chapter 3 on the last level resides in its higher percentage of specific 
topics (T11, T19) (see also Figure 14). Although globally attached to the most spe- 
cific level, the book exhibited a variable pattern at smaller scales. Thus, for subse- 
quent iterations, the first third fluctuated around the top levels 1, 2 and 3 with 
topics mainly related to the slave trade (robin johns slave trade; traders slaves 
trade calabar; town king english captains; atlantic africa numbers individuals). 
The two other thirds showed an increasing density of segments on levels 4 (end) 
and 6 (middle), corresponding to intermediary or narrower themes and events 
(wesley charles africans god christianity; roseau mortality shore believed night). 
This threefold pattern suggests a certain similarity with Swift's distribution, also 
displaying a higher concentration of segments on the last level in the middle of 
the book; this may correlate with Sparks' argument unfolding style, which seems 
closest to that of a storyteller of all the five history books analyzed. Wills' map 
showed the simplest configuration, with an initiator on level 8 and fluctuations 
involving levels 1 and 2 with the decrease in scale (Figure 15). What is surprising 
about Wills' text is the higher segment density on the deepest level in contrast 
with sparser inserts at the upper levels at every scale, as compared with the other 
books. This pattern can be explained by the “baroque” composition of the book 
(as also suggested by its “Baroque Prelude"), intended to create the “portrait” of 
one year, 1688, from discrete depictions of people, places and events around the 
world at that particular time. The global perspective is therefore constructed by 
general topics from the upper levels 1, 2 and 3 (time world year good long work; 
great people power made years trade; world voices voice sense human baroque), 
while the bottom level 8 cumulates topics with more precise but narrower scope 
(jews thy children jerusalem; spanish slaves coast portuguese; william king england 
james; muslim mughal ottoman hindu). The relative sparsity of segments on the 
upper levels and the abundance of mass on the lowest level can therefore be at- 
tributed to Wills’ method itself, based on “[s]erendipity, surprise, and letting one 
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thing lead you to another" (2001: XD, which involves the author less and the 
reader more in making connections and fitting together the pieces of the global 
history puzzle of 1688. 


5.3 Informational granularity maps 


The combined approach of topic modelling and fractal geometry led me to scalable 
representations of the analyzed texts (Figures 7 and 15) and their generic-specific 
dynamics, representations that I will call informational granularity maps. The term 
map was inspired by Bjornson's (1981) *cognitive mapping" and its role in the *com- 
prehension of literary texts". Bjornson distinguishes two modes of thought in the 
elaboration of a textual image. The first refers to the construction by readers of a 
*general idea" or *image" about what they are reading, i.e. *a poem, a play, a 
novel" and “how it can be expected to operate as they read". This general idea is 
then made more specific with the progression of the reading process, in the same 
way as “archeologists confronted by a heap of potsherds, start with a general idea 
about the nature of pottery and gradually refine that idea as they reconstruct a par- 
ticular pot" (1981: 58). The second mode of thought is related to the hypotheses 
readers make about the world and the confirmation, alteration or denial of these 
hypotheses as they continue to read, operations through which *information is 
added to the textual image by assimilation and accommodation". These two modes 
of thought would therefore produce a flexible cognitive construct, a “schematized 
map of the text and its imaginary territory — a map that facilitates remembering 
what has been read". Bjornson also assumes that although these constructions will 
differ from person to person, there are invariant features that “tend to recur in 
different readers' mapping of the same text" (1981: 59). Ryan used Bjornson's con- 
cept of cognitive maps in a narrower sense, referring to a *mental model of spatial 
relations" (2003: 215). She conducted a series of experiments with high school stu- 
dents, who were asked to draw maps of the story world of the Chronicle of a Death 
Foretold by Gabriel García Márquez, to investigate the readers' mental construction 
of the narrative space. Ryan notes that text processing, in the process of reading, 
operates at different levels, *words, sentences, paragraphs, passages", to which one 
may add the level of the *global meaning or narrative macro-structure" (2003: 234). 
She also discerns two types of memory involved in the reading process, the “long- 
term memory”, where the global representation of a text is stored, and the “sketch- 
pad of short-term, or episodic, memory" affected by *smaller textual units", where 
the readers form their *most detailed visualizations" or *picture-like representa- 
tions" (2003: 234). Based on the results of her experiments, Ryan concludes that 
early in the reading process readers create a global but schematic representation 
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of the spatial configuration of the textual world, and that they then concentrate on 
the plot, characters and visualization of the current scene, without the need to reor- 
ganize the whole map, which remains relatively resistant to new input. According 
to Ryan, this would explain the differences but also the common elements in the 
students’ sketches that, although not completely identifiable to cognitive maps, may 
document the selective work of long-term memory. 

My visual representations (Figures 7 and 15) did not involve experiments 
with readers and their cognitive constructs of the textual world or the spatial con- 
figurations expressed within it; instead they were built through automatic analy- 
sis, namely, from the perspective of the texts themselves and a particular type of 
information carried by them, independently of the readers. Thus, the term infor- 
mational was chosen, also inspired by studies in information and communication 
theory (Shannon 1948; Dretske 1999; Resnik 1995). These visual representations 
were intended to illustrate the informational granularity of the analyzed texts. 
That is, how texts, cut into smaller and smaller units of analysis, may change 
their geometry according to the reconfiguration at different scales of the spec- 
trum of generic to specific themes characteristic to each text. I considered these 
barcode-like representations as maps” that depicted how the initiators, and their 
positioning on a generic or specific level, encompassed at global scale a sort of 
“long-term memory" of the texts considered in their entirety. Shorter segments 
also exhibited a certain type of memory, identified as mainly anti-persistent, pos- 
sibly indicating a tendency of the segments to fluctuate around dominant levels 
(or attractors) at different scales. It would be interesting to compare via dedicated 
experiments, e.g. inspired by cognitive map studies (Bjornson 1981; Ryan 2003), 
the levels assigned by the algorithm (formula 1) with the levels of generality or 
specificity assigned by human readers of the texts, for each of the six iterations 
considered in the project. The fractal particularities of these maps (dimension, la- 
cunarity) need further analysis. However, they seem to suggest some correlations 
between the visual characteristics of segment dispersion and the strategy of argu- 
ment or story unfolding of the books. This drew attention, for instance, to certain 
words and topics that synthesized and linked together the conceptual threads of 
the texts, as entities evenly distributed throughout the units of analysis (chapters 
or parts), or on the contrary, to elements that narrowed down the scope of the 
narrative through localized descriptions or focused analyses of detailed content. 
The topic modelling approach used in level detection offered a first glimpse into 
this type of layered conceptual structure, despite its inherent limitations related 


27 Some similarities with *genetic maps" were also observed (see for instance Fang et al. 2020: 4). 
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to topic instability and dependence on the choice of the number of topics. More 
general techniques should be investigated as potential alternatives, for instance 
those derived from information, entropy and energy theory (Shannon 1948; 1951; 
Marcus 1970; Onicescu 1966), the study of lexical cohesion and lexical chains 
(Morris and Hirst 1991; Barzilay and Elhadad 1997) and the analysis of rare word 
clustering (Tanaka-Ishii and Bunde 2016). The potential connections between ther- 
modynamics, entropy, energy, the so-called “temperature of discourse” and the 
fractal dimension (Mandelbrot 1983: 347), and the stratified representations of 
texts and the dynamics of segment attraction to levels at various scales proposed 
in this study should also be further examined. 


6 Conclusion and future work 


The study proposed a method of text analysis that combined conceptual aspects 
from the model of zoomable text, topic modelling and fractal geometry. It was 
assumed that this type of methodology may assist in detecting different levels of 
generality and specificity in texts and reveal some characteristics of the assem- 
blage of blocks of text, above the word level, at different scales of representation. 
Applications of such an approach can range from hermeneutics and discourse 
analysis to text (and possibly z-text) generation and summarization. 

Further work will consist in deepening the analysis of measures such as lacu- 
narity, fluctuation and long-range correlation in conjunction with that of fractal 
dimension. A closer examination of the limitations of the applied techniques (e.g. 
impact on the results of certain factors in box counting and topic modelling) and 
the applicability of alternative methods from other fields of research such as in- 
formation theory, physics or genetics may also be envisaged. 
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Abstract: The framework for historical argument derived from the print mono- 
graph is increasingly untenable for digital historians. Digital argument is inclusive 
rather than selective in both its evidence and its form. The data required by digital 
methods produces multiple intertwined threads of interpretation and argument, re- 
sulting in arguments that are more expansive, larger in scale, than fit in a book. 
The alternative is obviously to present digital argument in its ‘native’ medium. This 
chapter analyzes how I conceive a form for one such argument, Harlem in Disor- 
der: A Spatial History of How Racial Violence Changed in 1935; a multi-layered hy- 
perlinked narrative that connects different scales of analysis: individual events, 
aggregated patterns and a chronological narrative. In three sections I lay out my 
understanding of the nature of digital argument, the options I consider for present- 
ing it and the details of the form that I decide to employ. 


Keywords: digital history, microhistory, narrative, argument, data 


The framework for historical argument derived from the print monograph is in- 
creasingly untenable for digital historians. Digital argument is inclusive rather 
than selective in both its evidence and its form. Using digital methods to analyze 
data results in arguments that are more expansive, larger in scale, than fit a book. 
The data required by digital methods produces multiple intertwined threads of in- 
terpretation and argument, beginning with the process of creating data from sour- 
ces, extending to aggregations and analysis of that data at shifting scales and in 
various relations and contexts, and including data that does not fit a given argu- 
ment. To fit a book, or even an article, digital argument must be reduced in scale 
and complexity. For the short-form argument of a journal article, those compro- 
mises are manageable. There are increasing examples of such articles that provide 
models for digital historians of how to shape digital history for print journals (Rob- 
ertson and Mullen 2021). For a longform argument, the compromises are more fun- 
damental and examples have until very recently been rare (with several scholars 
opting to instead recast digital research into a print monograph, so none of its ori- 
gins were visible) (Ayers 2003; Brown 2020; Thomas 2020). 

The alternative is obviously to present digital argument in its ‘native’ me- 
dium. The first digital historians of the internet age were quick to recognize that 


8 Open Access. © 2024 the author(s), published by De Gruyter. LGS This work is licensed under the 
Creative Commons Attribution-NonCommercial 4.0 International License. 
https://doi.org/10.1515/9783111317779-013 
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the properties of the digital medium offered new forms for scholarly argument, 
initially focusing on those possibilities rather digital methods for analysis (Smith 
1998; Ayers 1999; Rosenzweig 1999; Bass 1999; Ayers 2001; Thomas 2001; Robertson 
2004; Thomas 2004). Experimental online digital history journal articles were 
published by American Quarterly in 1999, the American Historical Review in 2000, 
2003 and 2005, and the Journal of Multi-Media History in 1998, 1999 and 2000, 
alongside experiments in other disciplines such as the journals Kairos and Vec- 
tors.” Robert Darnton sought to promote similar experiments in long-form digital 
argument, but the higher professional stakes attached to books resulted in the Gu- 
tenberg-e project he led being focused on transposing the monograph into an 
e-book. Then those experiments with forms of argument ended, at least in digital 
history.* On the one hand, historians showed little interest in developing the new 
approach to reading that the new forms required. On the other hand, digital his- 
torians’ attention turned toward other possibilities. The development of the open- 
source Omeka platform encouraged adoption of the exhibit as a form for history 
in the digital medium, primarily directed at non-scholarly audiences. Other digital 
historians focused on the affordances of the computer to process and visualize 


1 For a fuller elaboration of the narrative of the development of digital history in this paragraph, 
see my “The Properties of Digital History," History and Theory 61, 4 (2022), 86-106. 

2 “Hypertext Scholarship in American Studies," 1999, https://web.archive.org/web/20201114000152/ 
http://chnm.gmu.edu/aq/; Ethington 2001; Thomas and Ayers 2003; Censer and Hunt 2005; “Journal 
for MultiMedia History — Volume 3 (2000) Contents Page," accessed May 15, 2022, https://www.al 
bany.edu/jmmh/; “Vectors Journal: Mobile Figures,” accessed May 2, 2022, http://vectors.usc.edu/proj 
ects/index.php?project=54; “Kairos: A Journal of Rhetoric, Technology, and Pedagogy,” Text (Kairos: 
A Journal of Rhetoric, Technology, and Pedagogy), accessed May 21, 2022, https://kairos.technorhe 
toric.net/. 

3 “Gutenberg-e Home,” accessed May 21, 2022, http://www.gutenberg-e.org/; “What Is the Gutenberg- 
e Program? | AHA,” accessed May 21, 2022, https://www.historians.org/about-aha-and-membership/ 
aha-history-and-archives/historical-archives/gutenberg-e-program-(1999-2008)/what-is-the-gutenberg- 
e-program; Manning 2004; Seaman and Graham 2012. While none of these projects used digital meth- 
ods, three authors did initially produce projects that used the digital medium to create layered or 
non-linear arguments: Pohlandt-McCormick 2005; Lowengard 2006; Gengenbach 2005. 

4 Until 2021, when the Journal of Digital History launched. The Journal of Multimedia History 
published only three issues, in 1998-2000. Vectors ceased publication in 2013, having published 
seven issues, in 2005-2007, and 2012-2013. Kairos continues to publish. William and Mary Quar- 
terly did publish a *born-digital article" in 2018, but it did not involve digital analysis and is bet- 
ter described as a multimedia article. The article, which retained the form of a print article, 
featured sound, images and animations without making use of the immersive and interactive 
properties of digital media. See Newman 2018; Piker 2018; Ayers 2019. 
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data. The results of that work were typically presented in the digital medium as 
visualizations. As a form that users explore through browsing, searching and in- 
teraction, data visualizations have an implicit argument rather than the explicit 
argument expected by scholarly audiences. This is the situation William Thomas 
highlighted in 2015, when he presented the next phase of digital history as one in 
which “scholars may be called upon to play a more purposeful role in making 
interpretive arguments, to establish genres of digital scholarship, to engage in 
meaningful critical review of digital scholarship, and to deal more forcefully and 
deliberately with the digital divides in our disciplines” (Thomas 2015: 532). 
Opportunities to take up Thomas’ call have been limited and risky for digital 
historians. Academic journals remain print publications, if anything with less 
scope for online experiments because of a reliance on the platforms of commer- 
cial publishers and the challenges of sustaining the increasingly diverse and com- 
plex components of digital history projects. Academic publishers, even as they 
adopt e-book platforms, have lacked the infrastructure to publish digital argu- 
ments on a scale akin to monographs. It has been a professional risk for digital 
historians to commit to a digital format for a project that takes the time to com- 
plete of a long-form argument and which generally has significant professional 
stakes, in terms of being the basis for award of a PhD, hiring, tenure and promo- 
tion, especially as the sustainability of digital publications remain uncertain. 
However, just before Thomas’ call was published, the Andrew W. Mellon Founda- 
tion embarked on its ambitious Monograph Initiative to secure a future for long- 
form scholarly argument in the digital age (Maxwell et al. 2016). One strand of the 
projects funded by that initiative focused on publishing in a digital medium, in- 
cluding the establishment of Stanford University Press Digital Projects to publish 
scholarship that did not take the form of a book? In addition, beginning in 2018, 
the Mellon Foundation partnered with the National Endowment for the Humani- 
ties to offer fellowships to “support individual scholars pursuing interpretive re- 
search projects that require digital expression and digital publication".Ó As a 
result, in 2019, when I returned to work on a spatial analysis of the racial disorder 
in New York City's Harlem neighborhood on March 19, 1935, opportunities to pres- 


5 “Stanford Digital Projects,” accessed May 19, 2022, https://www.sup.org/digital/. The Press assumes 
responsibility for hosting and preserving the projects that it publishes; see Jasmine Mulliken, 
“Meanwhile, Behind the Server Scenes . . . ,” SUP Digital Blog (July 13, 2022), accessed July 2, 2023. 
https://blog.supdigital.org/meanwhile-behind-the-server-scenes/. For a discussion of the Digital Proj- 
ects published as of July 2022, see my “The Properties of Digital History." 

6 “NEH-Mellon Fellowships for Digital Publication,” The National Endowment for the Humani- 
ties, accessed May 21, 2022, https://www.neh.gov/grants/research/neh-mellon-fellowships-digital- 
publication. 
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ent the research in a long-form digital argument for publication online existed in 
a way they had not when I began my research in 2010. I am in the privileged posi- 
tion that pursuing those opportunities involved limited risk for me; as a tenured 
full professor, neither tenure nor promotion reviews are in my future. I success- 
fully applied for an NEH-Mellon Fellowship for Digital Publication in 2020 and in 
2021 I signed a contract with Stanford University Press to publish Harlem in Disor- 
der: A Spatial History of How Racial Violence Changed in 1935 as a Digital Project. 

In Harlem in Disorder I use granular data about events created from a variety of 
types of sources, mapped and analyzed in multiple relationships and at different 
scales, to argue that the violence was a complex mix of *hoodlums" taking the oppor- 
tunity to commit crimes, residents targeting white businesses that had been the sub- 
ject of political protests, poor and desperate people seeking food and goods they had 
been unable to obtain by other means, young men acting out their frustration and 
boredom and bystanders passing their time on the street being drawn into violence. 
Interpretations of racial violence generally acknowledge its multifaceted nature but 
select one thread to emphasize, as required by the form of argument which fits a 
book. In the case of Harlem in 1935, it has been the attacks on white property, novel 
at that time, but central to racial violence in subsequent decades. That emphasis 
comes at the cost of isolating those incidents from contexts that shaped them and 
simplifying the character of racial violence and how it changed in the second half of 
the twentieth century. It also makes it possible to discount portrayals of Black partic- 
ipants in racial violence as pursuing political goals by pointing to incidents that did 
not fit that picture — the kind of distortion prevalent in the aftermath of the protests 
following the murder of George Floyd in 2020 (Howard 2022: 263-264). Retaining com- 
plexity, as digital argument allows, better captures the nature of racial violence and 
shows the balance and relationship between different forms of violence.” For Harlem 
in 1935, the resulting picture is of transition not simply change, with overlooked vio- 
lent clashes between Black residents and white men and women that echoed earlier 
outbreaks of racial disorder occurring around the more prevalent and novel attacks 
on white property, producing a new mix rather than an entirely new form of racial 
violence. 

This chapter analyzes how I conceived a form for that argument - a multi- 
layered, hyperlinked narrative that connects different scales of analysis: individual 
events, aggregated patterns and a chronological narrative. In elaborating this concept 
of a long-form digital argument, my aim is to add to the models for argument-driven 


7 This approach is indebted to Seligman 2011. 
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digital history that my colleague Lincoln Mullen and I have identified. Picking up 
Thomas' call, as well as that of Cameron Blevins, our focus was on forms of argument 
for projects of a scale of an academic journal article (Blevins 2016). As opportunities 
emerge for the publication of digital argument on a larger scale, in digital mediums, 
models are also needed to encourage and enable digital historians to publish their 
scholarship in that form. As of May 2022, Stanford University Press (SUP) has pub- 
lished seven Digital Projects and will soon publish Lincoln Mullen's America's Public 
Bible: A Commentary and several others? Harlem in Disorder adds to those models 
one that should be accessible to a wide range of historians. As a microhistory that 
employs handcrafted data, the scale of my analysis employs a widely used framing 
and method that is not typically associated with digital history. My argument also 
departs further from the single linear narrative of a print monograph than most of 
the published projects. Finally, Harlem in Disorder is built on a freely available plat- 
form, Scalar, rather than as a custom site, offering another model for using a plat- 
form employed in two other SUP Digital Projects. In the following three sections I lay 
out my understanding of the nature of digital argument, the options I considered for 
presenting it and the details of the form that I decided to employ.'? 


1 The nature of digital argument 


I understand digital history argument as a combination of data as evidence and 
multiple interlinked threads of narrative interpretation. Digital tools and methods 
analyze data created from historical sources. This approach is not a return to the 
quantitative, social science history based on statistical analysis of the 1960s and 
1970s, but a different approach to data and its analysis. To mark that distinction, I 
label it *data-driven history" rather than quantitative history." This approach uses 


8 Arguing with Digital History working group, *Digital History and Argument," White paper 
(Fairfax, VA: Roy Rosenzweig Center for History and New Media, November 13, 2017). Accessed 
July 2, 2023. https://rrchnm.org/argument-white-paper/; Robertson and Mullen, “Arguing with Dig- 
ital History"; *Models of Argument-Driven Digital History," 2021. Accessed July 2, 2023. https:// 
model-articles.rrchnm.org/. 

9 "Stanford Digital Projects." 

10 In making sense of that process, I relied particularly on the work of Edward Ayers, William 
Thomas, Fred Gibbs and Trevor Owens, Johanna Drucker, Lisa Gitelman, Harmony Bench and Kate 
Elswit, and the Arguing with Digital History working group, especially my collaborator Lincoln 
Mullen. 

11 Here I am giving meaning to a term used by several scholars without definition. Sharon Leon 
uses *data-driven history" without explanation in a forthcoming chapter; Sharon Leon, “The 
Peril and Promise of Historians as Data Creators: Perspective, Structure, and the Problem of Re- 
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a greater range of data than quantitative history and uses it in a wider variety of 
ways. Digitization has made more data available in two senses, by making text ma- 
chine-readable, and by making historical records accessible online.? The greater 
variety of uses of that data reflect new tools for analysis - text mining tools and lan- 
guages for unstructured textual data, but also mapping and network graphing tools 
that encourage the creation of structured data about spatial locations and about rela- 
tionships.” Digital history additionally uses data in exploratory ways, not only for 
hypothesis testing and creating knowledge using statistical methods: anomalies, 
trends or unusual coincidences can be identified with simple frequency counts or 
correlations (Gibbs and Owens 2013: 168). A close reading of sources is then used to 
explain the patterns identified in the data, rather than seeking explanations in statis- 
tical models as quantitative history does (Gibbs and Owens 2013: 162). 

Representing sources as data is treated by digital historians as an act of his- 
torical interpretation. *Data are capta, taken not given, constructed as an inter- 
pretation of the phenomenal world, not inherent in it," in Johanna Drucker's 


presentation,” [Bracket] (blog), 2019, Accessed July 2, 2023. http://www.6floors.org/bracket/2019/11/ 
24/the-peril-and-promise-of-historians-as-data-creators-perspective-structure-and-the-problem-of- 
representation/. A variant, “data-driven scholarship,” was used by Kathleen Fitzpatrick (2011: 
104), who derived it from the title of a workshop organized by CHNM and MITH in 2008, with 
funding from IMLS, “Tools for Data-Driven Scholarship.” None of the published material from 
that workshop explains the term, but its focus associates it with the digital tools that shape the 
approach to data in digital history. An alternative label, *computational history," tends to be ap- 
plied to only some of the methods that rely on data, typically text analysis, and excluding map- 
ping and network analysis; see Arguing with Digital History working group, *Digital History and 
Argument," White paper (Fairfax, VA: Roy Rosenzweig Center for History and New Media, No- 
vember 13, 2017), Accessed July 2, 2023. https://rrchnm.org/argument-white-paper/. Alison Booth 
and Miriam Posner noted in 2020, *We can detect another shift in nomenclature: data is perhaps 
edging out digital." They link this to the rise of data science within universities, for which DH can 
offer a humanities perspective. In my use of the term, I'm suggesting that the shift does also re- 
flect the central place of data in the practices of digital humanists, that it is internal as much as 
external. See Booth and Posner 2020. 

12 Flanders and Munoz identify six *major types of research objects and collections that present 
distinctive forms of data" created by humanities disciplines," focused mainly on literary studies. 
Three are forms of textual data, scholarly editions; text corpora; text with markup; one a combi- 
nation of textual data and digitized images, the thematic data collection; one a product of digital 
research methods, data with accompanying analysis or annotation (an image, a map, a virtual 3- 
D reconstruction); and one a digital extension of a longstanding form, the finding aid. Julia Flan- 
ders and Trevor Munoz, *An Introduction to Humanities Data Curation," Digital Humanities Data 
Curation (blog), accessed May 10, 2022, https://guide.dhcuration.org/contents/intro/. Flanders and 
Munoz. 

13 Arguing with Digital History working group, “Digital History and Argument." 
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influential formulation (Drucker 2011). Whereas humanities scholars usually im- 
merse themselves in sources, dive in, and understand them from within, to use 
Miriam Posner's metaphor, creating data involves extracting information and fea- 
tures from sources, requiring the decomposition of a subject or object into ab- 
stract attributes and variables." Heterogeneous and ambiguous historical records 
do not fall straightforwardly into categories (Schoch 2013; Borgman 2015: 28; 
McPherson 2018: 130—131; Bench and Elswit 2022). In response, digital historians 
employ an iterative process to create data: as their research progresses they en- 
rich and enlarge their datasets, making choices about classifications in response 
to explorations and interpretations of new information that change the data 
(Hoekstra and Koolen 2019: 80). That process can also counter the dehumaniza- 
tion involved in the transformation of historical subjects into abstract categories: 
as Harmony Bench and Kate Elswit point out, “The process of manually curating 
data from archival materials draws us closely into the lived experiences they 
index, as we grapple with the multiple and conflicting stories behind each data 
point, and what each signifies" (Bench and Elswit 2022: 39). Black digital practice, 
as exemplified by Jessica Marie Johnson, emphasizes the need to carry that en- 
gagement forward, to *infuse the drive for data with a corresponding concern 
with and for the humanity and souls of the people involved" (Johnson 2018). 

This process of creating digital data requires more documentation than the 
citation of sources used for analogue research (Hoekstra and Koolen 2019: 81). 
Gibbs and Owens advocated that explanation take two forms, “history writing 
that explicates the research process as much as the research conclusions," and 
*history writing that interfaces with, explains, and makes accessible the data that 
historians use." (Gibbs and Owens 2013: 163-164) In both cases, those explanations 
extend beyond what can be accommodated in footnotes. For computationally gen- 
erated data, explicating the research process generally means a narrative discus- 
sion of the computer scripts, models or simulations executed to create data. That 
process is the same for all the data created by that computational tool from a sin- 
gle kind of source. The same is not true of data created from multiple sources, 
which is more often handcrafted than computationally generated. General state- 
ments about that process reveal little about the choices made about individual 
data points. In creating data about the career of African American choreographer 
Katherine Dunham, Bench and Elswit, *routinely cross-reference and reconcile in- 


14 Miriam Posner, “Humanities Data: A Necessary Contradiction — Miriam Posner's Blog," 
June 25, 2015, Accessed July 2, 2023. http://miriamposner.com/blog/humanities-data-a-necessary- 
contradiction/. 

15 For more detailed accounts of this process see Leon, “The Peril and Promise of Historians as 
Data Creators"; Croxall and Rawson 2019; Erickson 2013. 


342 —— Stephen Robertson 


formation including from Dunham’s personal and professional correspondence, 
contracts, company documents from receipt books and payroll to costume lists, 
personal logs, programs, scrapbooks, lighting plots, and newspaper clippings. To 
these we have added supplemental data sources, such as immigration records, 
local newspapers, and historical transportation maps and schedules” (Bench and 
Elswit 2020: 290). As that process is different for each data point, it cannot be fully 
captured by a narrative discussion of method. Instead Bench and Elswit leave 
traces of it including sources and annotations in their dataset (Bench and Elswit 
2022: 291). 

Sharing datasets with such documentation, a practice borrowed from the so- 
cial sciences, is the most common form of the second type of explanation advo- 
cated by Gibbs and Owens. Humanities data, however, has unique features that 
require modifications to the practices used in the sciences. Outlining principles 
for curating humanities data, Flanders and Munoz note that “While data in non- 
humanities disciplines clearly carries an interpretive framework with it, in the 
humanities the interpretation is in some cases the primary object of interest, not 
just a perspective on the data that can be separated from it". This interpretation 
exists in layers, as *crucial decisions affecting the usability and meaning of hu- 
manities data are made at each stage of data creation and management". How- 
ever, even to the extent that scholars effectively record these multiple layers of 
interpretation of data in a dataset, they are located at a distance from a narrative 
argument, limiting the extent to which history writing actually interfaces with 
the data in the way Gibbs and Owens envision. 

Although the gap created by the separate publication of datasets makes it diffi- 
cult to see, data changes the relationship between evidence and argument by multi- 
plying the connections between them. Data includes more than the evidence for an 
argument presented in a print narrative and its footnotes; it includes all the re- 
search material gathered for a project. Reflecting on the first decade of digital his- 
tory, Edward Ayers highlighted this contrast between historians' conventional 
research practices and working with data. *In conventional practice, historians ob- 
scure choices and compromises as we winnow evidence through finer and finer 
grids of note-taking, narrative, and analysis, as the abstracted patterns take on a 
fixity of their own. A digital archive, on the other hand, reminds us of the connec- 
tions we are not making, of the complications of the past, every time we look at it" 
(Ayers 2001: 6—7). “Databases tend toward inclusivity, narratives toward selectiv- 
ity", as Katherine Hayles put it (Hayles 2012: 182). 


16 Flanders and Munoz, *An Introduction to Humanities Data Curation." 
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In addition, creating data produces “individual, separate and separable” items 
that can be aggregated to allow their analysis in different relationships and at dif- 
ferent scales." Analysis of sources by immersing oneself in them, to use Posner's 
metaphor for conventional humanities approaches, does not lend itself to aggrega- 
tion in the same way. In that case, data appears in context rather than being sepa- 
rate and separable and is analyzed through *an iterative process of reading, 
questioning, contextualizing, and comparing" from which relationships and pat- 
terns are intuited rather than established by aggregation. By contrast, when data is 
created from sources, it is divided and classified and can be grouped based on any 
of its properties. “The power within aggregation is relational, based on potential 
connections", as Lisa Gitelman and Virginia Jackson put it: *network, not hierar- 
chy" (Gitelman and Jackson 2013: 8). Data can also be aggregated at different scales 
(Bench and Elswit 2022: 38). Data's capacity to be aggregated, together with its inclu- 
sivity, creates multiple connections between data and argument. 

Given the nature of data as evidence, a data-driven argument takes the form 
of multiple threads of narrative and interpretations of data linked together. The 
print academic article and monograph have only a limited capacity to present 
complexity of that kind. Multiple arguments can be presented sequentially, but, 
as David Westbrook argued, that structure “pulls discursive strands apart" so that 
much of the *complex interplay" in which they *reinforce each other, highlight 
each other through conflict, and reveal things together that they never could 
apart" is “necessarily lost" (Westbrook 1999: 255). Footnotes, endnotes and appen- 
dices provide a means of adding additional complexity to a narrative through 
sources that support an argument, or relate it to the work of other scholars and 
to other topics. However, that material is limited in scale and is located on the 
page and within the text in a position that clearly subordinates it to the “main 
text” (Landow 2006; Ayers 2001: 7; Adema 2021: 29-30). A reader could potentially 
follow a reference in a note to another text, but for most that is not an option. 
While they might have access to an academic library that holds the secondary lit- 
erature, only a small proportion of primary sources being excerpted or referred 
to likely would be available online. To view most, a reader would have to visit a 
particular archive (Chartier 2011: 11). So most readers likely quickly return to the 
“main text.” Notwithstanding the presence of notes, then, the print narrative ef- 
fectively remains largely closed to other threads of argument and interpretation. 
Hence, long-from digital history arguments presented in books, and to a lesser ex- 
tent short-form arguments in articles, by necessity must omit elements of both 
argument and data. 


17 Leon, “The Peril and Promise of Historians as Data Creators.” 
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In a digital medium, however, an argument can take the form of multiple 
threads linked together, “as layered or branching or interweaving narratives”, to 
convey more of the complexity of the past than is possible in print argument 
(Ayers 2001: 7). Such an argument is necessarily composed of modular elements; 
each thread is a pathway that uses hyperlinks to connect elements in a particular 
order. Those threads are presented simultaneously, so elements can be linked to 
more than one thread, a form which better represents the multiple relationships 
among arguments and the multiple interpretations of sources and data than the 
traditional model in which evidence is subordinated to argument and a single 
thesis (Bass 1999: 276). As David Lloyd put it, the effect of digital argument is “to 
liberate contradictory and refractory threads in the material from the demands 
of a historically based argument, where they were necessarily smoothed over in 
the interest of coherence” (Hayles 2012: 38). 

While the links that create such a digital argument make it hypertextual, its 
form is not one in which the user creates meaning as imagined by early hypertext 
theorists. The threads available to a reader are structured by the author (Ayers 
2001: 8). Links to multiple, simultaneous threads do work to disrupt the linear 
flow of reading associated with the print book, but here that disruption is a fea- 
ture not a bug (Hayles 2012: 63-68). Data-driven argument fits the case Hayles al- 
lowed for, a case in which, as Diana DeStefano and Jo-Anne LeFevre put it, the 
*complexity of the hypertext experience is more desirable than maximizing com- 
prehension and ease of navigation".? Links call attention the choices made by a 
historian in constructing an argument (Drucker 2014: 178—179). 

Rather than hypertext, it is the database that has come to represent the char- 
acter of the modular, structured form of digital argument. Although the content 
of digital media can be stored in databases, the term has been used more gener- 
ally, almost metaphorically, by scholars such as Lev Manovich, Ed Folsom and 
Hayles to highlight the separation of content and an interface that provides access 
to that content as the key feature of digital media (Hayles 2007; McGann 2007; Fol- 
som 2007a; Folsom 2007b; Price 2009). With that separation, “it is possible to cre- 
ate different interfaces to the same material", which each embeds a particular 
organization" (Manovich 1999: 86). Narrative argument can be one form of inter- 
face, but it is not implicit in the logic of the database (Manovich 1999: 83). As a 
result, Manovich points out, the author has to do more than simply link database 
records to create a narrative, contrary to hypertext theory; they *also have to con- 


18 Landow, Hypertext 3.0; Robertson 2004. 
19 Diana DeStefano and Jo-Anne LeFevre, *Cognitive Load in Hypertext Reading: A Review," 
Computers in Human Behavior 23.3 (2007), 1636, quoted in Hayles 2012: 68. 
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trol the semantics of the elements and the logic of their connection” (Manovich 
1999: 87). 

The idea of argument as an interface to a database also encompasses the 
other form in which data and argument are connected in digital history, in and 
through visualizations. As individual, separate and separable items, data lend 
themselves to spatial display, “are mobilized graphically,” as Gitelman and Jack- 
son (2013: 12; see Hayles 2007: 1606) put it. Such visualizations can present the 
complex, multivariate arguments of data-driven history not as pathways of linked 
modular elements, but in the form of “rich, browsable interfaces that reveal the 
scale and complexity of the data behind them, and provide a context that en- 
riches the exploration and interpretation of that data”20 They show patterns in 
data — numbers of frequently collocated or reused words in texts and connections 
in network graphs, and patterns of spatial proximity, movement and interaction 
with spaces in maps and 3D models. The designers of Photogrammar, for exam- 
ple, used maps to make *visual arguments about the breadth and eclecticism of 
FSA-OWI photography: *a choropleth county map of the U.S.A [. . .] for example", 
conveys one of Photogrammar's key historiographical interventions and the basis 
of its claim for further attention: the FSA-OWI collection was national in scope, 
covering much more than the images of poverty in the Dust Bowl and American 
South that public history has tended to emphasize" (Cox and Tilton 2019: 134—145). 
However, the tools borrowed from the empirical sciences with which visualiza- 
tions are created can work at cross-purposes with how digital historians work 
with data, serving as *a kind of intellectual Trojan Horse, a vehicle through which 
assumptions about what constitutes information swarm with potent force", in 
Drucker's metaphor. Those tools assume “assume transparency and equivalence, 
as if the phenomenal world were self-evident and the apprehension of it a mere 
mechanical task", so present data as certain. As a result, digital historians' visual- 
izations need to be reoriented to present data as created and interpreted from a 
subjective point of view (Drucker 2014: 125—126, 130). 


2 Presenting digital argument 


While there are a growing number of examples of making short-form digital his- 
tory arguments in print articles, there have not been examples of making long- 
form digital arguments in books. An exception appeared after I began work on 
Harlem in Disorder, Cameron Blevins' Paper Trails: The US Post and the Making of 


20 Arguing with Digital History working group, *Digital History and Argument," 26. 
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the American West, a spatial history like my project. Blevins used digital history 
“to see the postal network in its entirety”, “processing, analyzing and visualizing” 
a dataset of “every post office that ever existed in the United States” (Blevins 2021: 
5). That digital project is connected to the book through the inclusion of a large 
number of maps and charts (accommodated by the unusual dimensions of the 
book) and the creation of an online supplement. Blevins stressed that the book 
presents a data-driven argument, that those maps “are foundational for its argu- 
ments, interpretations and larger narrative structure” (Blevins 2021: 7). The core 
data on the location of post offices is available online, accompanied by a detailed 
“data biography”. An online supplement presents a chronological mapping of the 
post office data accompanied by a brief narrative which explores features and de- 
tails of a beautifully designed map.” However, that macro analysis is only one of 
the threads of Blevin's argument: *Maps of the postal network provide a bird's 
eye view of the network as a whole, but most 19"-century Americans didn't inter- 
act with the US Post at thirty thousand feet. Paper Trails repeatedly descends to 
ground level in order to see the system through their eyes. Individuals make up 
the narrative heart of this book" (Blevins 2021: 14). They are, however, missing 
from the digital supplement; a user can only zoom in and click on individual 
points to obtain details of a particular post office. So while Blevins presents his 
argument as “ultimately [...] defined by the intersection between these two 
scales, the ways in which large forces shape individual lives and how human ex- 
perience gives meaning to the structures that define our world", publishing that 
argument in print required the large scale visualizations be transformed from 
their native digital form into static images and published online apart from the 
individual scale (Blevins 2021: 15). Blevins managed that combination adroitly, re- 
flecting his skill as a writer and his experience as a leading digital historian, but 
the resulting hybrid form inevitably imposed limits on the nature of the digital 
argument. (However, it brought other benefits: time to research and write rather 
than find a digital format; an audience that reads and reviews books; and a publi- 
cation in a print format that still has more professional standing and is more 
sustainable).? 

Digital maps represented an option for presenting Harlem in Disorder, which 
like Paper Trails is centered on mapping data. Since the early 2000s, digital histor- 
ians have been experimenting with online platforms to reimagine mapping in hu- 


21 Online supplements, many based around maps, appeared in the US in the 2010s, to provide 
resources related to scholarly monographs. Some of those projects used digital methods, but un- 
like Paper Trails, they did not make digital arguments. For examples of online supplements in 
Southern history publications, see Blevins and Hyman 2022: 89—90. 

22 Cameron Blevins, email message to author, May 15, 2022. 
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manities terms as a means of combining scales and presenting multiple threads 
of argument. One result has been a series of edited collections theorizing deep 
maps and spatial narratives produced by groups of scholars convened by David 
Bodenhamer, John Corrigan and Trevor Harris (2010; 2015; 2021). In his most re- 
cent definition of deep maps, Bodenhamer argued that they “link time and space 
(chronotope), operate across multiple scales of time and space, embody multiple 
agents and multiple perspectives, recognize alternate schemes and emergent real- 
ities, foster a dynamic context that reveals movement and linkage, and (ideally) 
are emotional and experiential” (Bodenhamer 2021: 7). Deep maps thus fit the 
multiplicity central to digital history arguments in their “multiplicity of sources 
used, of perspectives represented, of experiences captured, of interfaces used”, 
and “enable multiple meanings to be made by its users”, as Lincoln Mullen and I 
put it (Robertson and Mullen 2021: 132). Moreover, deep maps provide the context 
for argument, with users navigating through a narrative created by the author. 
Even as such spatial narratives involve a selected pathway, elements not included 
in that pathway remain on the map and around the narrative — more visible than 
the contents of footnotes, appendices and indexes — so that users can see other 
possible paths even if those alternative narratives have not been constructed 
(Robertson and Mullen 2021: 136-137). In that way, spatial narratives differ from 
the strictly linear narratives that are the output of StoryMap tools tied to Geo- 
graphic Information Systems and their 2D maps and social science methodologies 
(Robertson and Mullen 2021: 135). 

While technologies exist to extend GIS to allow “dynamic representations and 
interactive systems that will prompt an experiential, as well as rational, knowl- 
edge base", as Bodenhamer pointed out, realizing the possibilities of deep maps 
requires a means of using those technologies in combination, a platform that is 
*an environment embedded with tools to bring data into an explicit and direct 
relationship within place and time", analyze that data and trace and present spa- 
tial narratives and arguments (Bodenhamer 2021: 4, 7). Such a platform does not 
yet exist. I did use Neatline to create a prototype during the preliminary stages of 
my project, an experiment in spatial narrative that was discussed in the chapter 
Lincoln Mullen and I contributed to Making Deep Maps. A set of tools for the digi- 
tal collection software Omeka, Neatline combines maps, a timeline and text and 
visual annotations to the map that support the creation of manually created, in- 
teractive spatial arguments.” My prototype site used annotations to elaborate 
connections between points on the map and a modular narrative argument in 
waypoints, links located in a sidebar that open on the map and used the timeline 


23 "Neatline," accessed May 19, 2022, https://neatline.org/; Nowviskie et al. 2013: 692—699. 
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as the means of navigating the spatial narrative. Waypoints can be associated 
with a zoom level centered on a specific location; clicking on them moved you 
around the map. Annotations in the form of polygons and lines also shifted some 
of the argument into a visual form, directing attention to movement, direction, 
proximity and connection. However, waypoints can accommodate only a limited 
amount of text, and the complexity of the narrative required the waypoints be 
broken into groups and appear on a series of maps, limiting the scope of the spa- 
tial narrative (Robertson and Mullen 2021: 142-145). Neatline also did not offer an 
effective way to integrate the legal process and official investigation that followed 
the events of the disorder.”* 

If a fully realized spatial narrative is not yet possible, the vision Robert Darn- 
ton offered in 1999 of an electronic book structured *in layers arranged like a pyr- 
amid" has become more readily realized and reimagined as a form for digital 
argument (Darnton 1999). Darnton conceived those layers as augmentations of a 
top narrative layer, providing thematic accounts, documentation, historiography, 
suggestions for classroom uses and commentary. As Ayers noted in 1999, *Darn- 
ton’s vision, while exciting, is more archival than hypertextual. It elaborates upon 
the traditional book but does not change the central narrative" (Ayers 1999: 4). In 
other words, he was not conceiving a form for data-driven argument; to the con- 
trary, Darnton explicitly stated that he was “not advocating the sheer accumula- 
tion of data, or arguing for links to databanks - so-called hyperlinks. These can 
amount to little more than an elaborate form of footnoting". 

Darnton's multi-layered form of publication has recently been reimagined 
for data-driven arguments. Introducing the Journal of Digital History, Andreas 
Fickers and Frédéric Clavert open with Darnton's vision, and echoed him in de- 
scribing their publication as “a multilayered publication platform in the field of 
history that will offer new opportunities for showcasing data-driven scholarship 
and transmedia storytelling in the historical sciences" (Fickers and Clavert 2021). 
Their focus was on connecting argument with the methods of analysis on which it 
relies. As a response to *the need for a stronger transparency of how digital infra- 
structures, tools and data shape historians' practices", articles include a narration 
layer, a hermeneutic layer and a data layer. The data layer contains both datasets 
and code thanks to an editorial system based on Jupyter notebooks. *The method- 
ological implications of using digital tools and data" are elaborated in the Herme- 


24 David McClure, a lead developer on Neatline, later developed a custom platform, Grapl, that 
builds out some of the features of Neatline to create *a unified, coherent threefold path linking 
long-form scholarly text, data-driven maps, and richly mapped data—a framework for the publi- 
cation of both texts and datasets, held tightly together by the spatial logic of the map." McClure 
and Worthey 2019. 
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neutic layer. In the Narration layer the focus is on “transmedia storytelling,” in- 
corporating multiple media or platforms into the argument, which in practice 
means combining text with visualizations, to date, charts, maps and 3D objects.” 
Asa model of digital argument, the Journal of Digital History goes some way to- 
ward what my project needs, but as a vehicle for short-form argument lacks the 
multiple threads of argument, aggregation of data at different scales and granular 
examination of data creation. 

A more expansive multi-layered data-driven argument is used by Lincoln 
Mullen for America’s Public Bible: A Commentary, a website to be published by 
Stanford University Press. In an explanation of “How to use this website,” he also 
invoked Darnton to describe how “the different elements of the site form an inter- 
pretative pyramid, something like the e-books that Robert Darnton envisioned”. 
In contrast to the Journal of Digital History, scale serves as the basis of Mullen’s 
layers. Darnton did not explicitly address scale in describing his pyramid struc- 
ture, but his layers did represent “ever-wider components”, as Thomas put it in 
2001 (418). So too does Mullen’s structure, while also reflecting the form and pro- 
cess of his argument; namely, “identifying, visualizing, and studying quotations in 
American newspapers”. At the base is a dataset of millions of biblical quotations 
that appeared in newspapers, a selection of which are aggregated into trend lines 
showing the appearance of a verse over time in another layer. (Mullen’s data is 
computationally generated, so their creation is dealt with in an essay on method, 
but the pages which show aggregate trends include a measure of the certainty 
with which the prediction model identified text matching the biblical verse — and 
link to the newspaper pages on which they appear.) Two narrative layers offer 
interpretations of that data, verse histories for a selection of the data and essays 
on broader questions in the history of the Bible in the United States and the most 
popular verses and genres of verses. 

Mullen’s layers of data, aggregate and narrative fit the form of my argument, 
but the different nature of my data changes the nature and relation of the layers in 
my publication. Mullen was using big data to examine public life in a period of 
over a century, while I am using a small dataset to understand an event that lasted 
less than a day and its immediate aftermath. Rather than text generated by compu- 
tational feature extraction and prediction from a single type of source, my event 
and prosecution data are handcrafted from multiple sources, including newspa- 
pers, legal records and a variety of information gathered in the official investiga- 
tion of conditions in Harlem. As the combination of sources for each data point 


25 “About the Journal of Digital History," Journal of Digital History, accessed May 19, 2022, 
https://journalofdigitalhistory.org. 
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varies, each needs a narrative description of the interpretive choices made to cre- 
ate it. Since the data model includes multiple categories, events appear in multiple 
groupings, producing a denser web of connections within this layer than in Ameri- 
ca’s Public Bible, in which verses can be accessed only on the basis of how fre- 
quently they were quoted, in chronological order by the year of their peak rate of 
quotation and in biblical order. Mullen captured the different balance between 
data and narrative in the forms of our arguments when he labelled America’s Pub- 
lic Bible an “interactive scholarly work”, presumably evoking Thomas’ definition. 
As “hybrids of archival materials and tool components”, “Interactive scholarly 
works have a limited set of relatively homogeneous data, and they might include a 
textual component on the scale of a brief academic journal article”. The emphasis 
on narrative of Harlem in Disorder, by contrast, fits Thomas’ definition of a digital 
narrative: a “highly configured, deeply structured” “work of scholarly interpreta- 
tion or argument embedded within layers of evidence and citation”, with “explicit 
hypertext structures” that situate evidence and interpretation in ways that alow 
readers to unpack the scholarly work” (Thomas 2015: 531-532). 

Unlike the case with spatial narrative, the combination of technology for a digi- 
tal narrative does exist. Content management systems combine interfaces for pub- 
lishing, creating and editing content (including various media) with databases for 
storing and managing that content. My initial experiments involved working with 
WordPress, the most widely used open-source CMS. However, it became clear it 
would be difficult to build from scratch a structure for the relationships between 
different elements of my argument in that platform. An alternative open-source 
digital authoring and publishing platform that offered features to construct those 
relationships existed in Scalar, developed by the Alliance for Networking Visual 
Culture at USC.” In the same spirit as the idea of deep maps, Scalar is based on 
reimagining the database in humanities terms: a “speculative remapping of rigidly 
logical structures toward more conceptual ones, creating possibilities for many-to- 
many relations of diverse and varied kinds”, as Tara McPherson, part of the team 
that developed Scalar put it. As “a platform for imagining relation”, Scalar aligns 
with the form of digital argument (McPherson 2018: 223). For my project, the key 


26 The Born-Digital Scholarly Publishing: Resources and Roadmaps Institute organized by Brown 
University and funded by the NEH held in summer 2022 provided training in three platforms, 
WordPress, Scalar, and Manifold (a Mellon-funded e-book platform that supports digital media 
and iterative scholarship but is otherwise too closely modeled on the print book to be useful for 
my project). The other widely used open-source humanities CMS is Omeka. It fits an interactive 
scholarly work better than a digital narrative as its focus is on a digital collection, with metadata 
a central element of its design. Omeka’s exhibit building features do not provide the structures 
for creating relationships available in Scalar. 
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feature is its flexible structure, which “allows you to model conceptual structures 
in a variety of ways, exploring the full capacity for various sequences and group- 
ings” (McPherson 2018: 197). That flexibility is also crucial to the iterative nature of 
humanities analysis of data. As McPherson notes, Scalar “doesn’t demand that 
scholars have mapped a rigid data structure in advance of authoring, which allows 
for flexibility in the relation between different components of a project” (McPher- 
son 2018: 193). And equally important, Scalar supports arguments that move across 
scales, “as scholars are able to move from the microlevel of a project (perhaps a 
single image or video annotation) to the structure of the entire project and its inte- 
grated media. The researcher can create careful readings within a project of many 
components that can also be instantly represented as a whole collection” (McPher- 
son 2018: 211). 

The semantic elements of Scalar used to create relations are paths and tags. 
Paths are sets of sequential pages, which can contain subpaths, branching narra- 
tives. Tags create categorical groupings, which can also refer to other tags to create 
linked groupings (McPherson 2018: 197). The tag is the key to creating relationships 
that group the modular elements of a Scalar site into scales of analysis and link 
and interweave data, analysis and narrative. In other platforms, clicking on a tag 
takes a user to list of all of the pages that have that tag; in Scalar, clicking on a tag 
takes you to a page, which in additional to identifying all the pages that have that 
tag, can include text and media. As a page, a tag to be a site for analysis and inter- 
pretation not just connection and collection. 


3 A multi-layered, hyperlinked microhistory that 
connects different scales of analysis 


Harlem in Disorder uses Scalar’s paths to create a linear narrative: a chronologi- 
cal description of the events of the night of March 19, 1935, with maps of their 
location, the prosecutions that spanned the following several months and the in- 
vestigation that culminated in a report submitted to New York City’s Mayor more 
than a year later. Additional layers of argument examine categories of events, 
stages and outcomes in the legal process and the forms and reporting of the inves- 
tigation. Those groupings aggregate events at different scales, extending the spa- 
tial grouping of events provided by the map. The data on the events of the 
disorder and the subsequent prosecutions are a further layer, presented not as a 
dataset, but as a set of pages that describe the interpretive choices made in creat- 
ing each data point from multiple sources. Included are data that cannot be fitted 
to arguments made in the narrative: events without information on timing, so not 


352 —— Stephen Robertson 


in the chronology; events that have no location, so are not on the map and events 
that appear only as prosecutions in court, neither in the chronology nor on the 
map. The final layer are the sources, each a page to which all the notes citing that 
source are linked. The sources themselves are not part of the site; they would fit 
naturally into the note pages, but the published sources are still under copyright 
and access to the legal records and other archival sources is restricted. 

Links between those multiple layers take the form of tags. Employing the cat- 
egories in the data model as tags for events makes them a visible part ofthe mul- 
tiple narratives in the site, creating relationships that group events for analysis at 
different scales and linking data, analysis and chronological narrative. The data 
model itself is unremarkable; it involved none of the challenges of modeling 
events in the lives of the enslaved, for example. However, moved from a dataset 
to a narrative the categories of the data model become part of the structure of the 
argument and make visible how the argument is data-driven. 

Tags for timing and location serve to construct a narrative that encompasses 
multiple events happening in different places at the same time, as happens in racial 
disorder. Tags for timing group individual events into sections of the chronological 
narrative each covering a thirty-minute period. The ambiguity and uncertainty of 
data on time precludes using shorter spans. Tags grouping events by location might 
seem redundant given the mapping of the data. However, they are a necessary con- 
comitant, a means of addressing the uncertainty and ambiguity not visible in a 
point on the map. Grouping events by location also incorporates events for which 
there is no information on timing. The tags group events by the city block on which 
they occurred on (or as an occurring at an unknown location), creating narratives 
that branch off the chronological narrative. 

Tags for events in the disorder classify them as one of nine types, with addi- 
tional tags that group them based on different features of those types, who was 
involved, and whether an arrest or prosecution followed. The label for each tag 
includes the number of events in that group, to give a sense of context at a glance 
(eg *Assaults (54)"). The subtags for the five largest groups of events used in the 
project are listed in the column under the event type tag in Table 1. The Assault 
tag page, for example, includes links to the fifty-four events grouped as assaults. 
It also features fifteen related tags into which those events are grouped. Six tags 
are forms of assault, six tags are based on the identities of the alleged victim and 
two tags are based on the police response. There is also a tag for assaults by po- 
lice, to highlight a gap in the data: the absence of specific incidents of violence by 
police notwithstanding widespread statements about police beating and shooting 
at people on the streets. Additional tags group those who were injured in alleged 
assaults and alleged assaults that resulted in arrests and those that resulted in 
prosecutions. In combination, these subtags create a dense web of relationships 
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between the events grouped as assaults.” The page itself contains a discussion of 
how the category was defined, of the patterns in the types of assault and who was 
involved and how they departed from violence in Harlem in 1935, summarizing 
the more detailed discussions on the tag pages for those categories and contexts. 
An interactive map of the events of the disorder that can be located, with a layer 
consisting of the assaults highlighted, anchors a discussion of patterns in where 
this group of events occurred, followed by a discussion of patterns in when they 
occurred. Finally, there is a discussion of how other historians have interpreted 
this type of event as part of disorder in Harlem. This organization is replicated 
for each of the related tags, which analyze smaller subgroups and point to the 
relations between them. 

Employed in that way, the tags bring patterns in the disorder into focus at 
multiple scales. Classifying data simplifies it, somewhat reducing its complexity. 
However, in this form of digital argument the more complex, disaggregated view 
of the events is still accessible in the maps and in the layer of pages discussing 
individual events that appear on each tag page. 

Pages for individual events offer a different perspective on complexity. Most 
of those pages have multiple tags highlighting different facets of what happened 
during the disorder and who was involved and creating a point of connection to 
those arguments (Figure 1). Pages for individual arrests events also have tags re- 
flecting the categories from the data model for prosecutions: they are grouped by 
the event from which they resulted, the identity of the defendant, the charge, in 
which courts prosecutions took place, the outcome and the sentence (Figure 2). 
Arrests are a separate category of event to counter the tendency to assume those 
arrested are guilty or at least involved in the related event. For Harlem in 1935 
that is a particularly questionable assumption given the practices of the predomi- 
nantly white police officers responsible for making those arrests. 

The content of individual event pages provide the data's backstory, in greater 
detail than summary traces of the process of data creation of the kind that can be 
left in a dataset. Each of the sources in which information on the event appears 
are discussed, agreement and disagreement among them is identified and deci- 
sions about what information to use and why, and about how to categorize event, 


27 As I created the project, the density of relationships exposed limits in the capacity of Scalar. 
The performance of the site slowed significantly, a result that discussions with Scalar's develop- 
ers revealed was a result of pages that included more than 100 relationships. That limit was not 
mentioned in the software's documentation. As a result, the tags attached to upper-level category 
pages had to be replaced with manually created links, which do not impose the same load as 
tags. Tags identifying types of sources were also removed, replaced with manually created links 
in the notes pages that access analysis of different sources. 
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Figure 1: Tags on a page for an event in the disorder. 
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Figure 2: Tags on a page for an arrest in the disorder, including tags for sections of the narrative of 
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prosecutions and categories of prosecutions. 


are explained. This analysis also highlights gaps in the information, highlighting 
when the categorization, timing or location of an event are uncertain to counter 
the apparent certainty of its appearance in the chronological narrative and on 
the map. The label for an event, where possible, names the participants: the 
name of the owner or staff member of a looted or damaged store or the name of 
an individual killed, injured or arrested. In some cases, Probation Department 
files, and census and draft records provide details of individual lives. Although 
fragmentary, such information offers some counterbalance to the reliance on ag- 
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gregate numbers to describe the disorder in the existing literature on Harlem in 
1935 and the broader tendency of data to dehumanization. 

If producing a page for every data point seems an unmanageable amount of 
work, it is worth remembering that the contents of these pages are necessarily 
part of data-driven historical research. They describe decisions that are required 
to create handcrafted (not computationally generated) data, information that is 
currently summarily recorded in some part of a dataset (or when evidence is not 
data, not systematically published at all). That displacement of the details of our 
analytic and interpretive processes fitted the scale of the print forms in which 
historians published. Absent those constraints, the importance of that informa- 
tion for understanding and assessing an argument warrant bringing into a closer 
relation with narrative. 


4 Conclusion: Reading digital argument 


The detailed granular data and dense web of relations that connect the multiple 
layers of Harlem in Disorder result in an argument that is larger in scale and 
more expansive than could be contained in a book. Given that unfamiliar struc- 
ture, it is important that Harlem in Disorder is still legible as a ‘book.’ The Table 
of Contents that is part of the design of Scalar highlights the chronological narra- 
tive, so that it serves as a spine for the publication. Following that narrative are 
sections analyzing newspaper coverage, individual events and contexts for those 
events beyond the disorder - sections that could be understood as appendices. 
Strictly adhering to that linear path through the content would leave the narra- 
tive alone to convey the complexity of the disorder. Reading in that way, how- 
ever, seems an unlikely response to Harlem in Disorder; it would require ignoring 
links and tags in those pages that connect to the individual events discussed in 
each section of the chronological narrative and to pathways to other layers of 
narrative. After almost twenty years of additional experience with hyperlinks 
than those who encountered the early experiments in digital argument, readers 
are likely to have a greater degree of comfort with them. At the same time, given 
the content ofthe layers linked to the narrative, readers interested in the process 
of historical research and interpretation are more likely to engage with those 
threads of the argument. In that regard, the project has been conceived as a 
primer of sorts on historical method of interest to an audience beyond those who 
work in the fields of African American history and digital history. By exposing 
layers of historical analysis and interpretation it fills a gap in teaching between 
the primary source and the secondary source. 
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For readers who do follow links, the chronological narrative will be a touchstone. 
After starting by following that narrative, a reader will depart from it at a point 
of interest to zoom in on a specific event and arrive at a page that contains tags 
that link to more information about events related to that event in some way and 
notes that link to sources. Clicking on a tag would take the reader to a different 
layer of argument, analyzing a category of events or a type of source. In moving 
across individual events and different groupings of events, that pathway would 
embody the complexity of the disorder while also highlighting how the shape of 
that argument is the product of choices about how to categorize events identified 
in the sources. Although they are navigating across as well as along narrative 
layers, readers should not become lost, as many felt in the early digital argu- 
ments. Whatever page they are on, Scalar's menu provides multiple options for 
finding their way: they can click on the compass icon (labelled B in Figure 3) to 
access a list of recent pages on which to retrace their path; or they can click on 
the list icon to the left of the compass (labelled A in Figure 3) to access a table of 
contents that provides a way to return to a section in the chronological narrative 
or category or event in the other layers of argument, or to the thematic introduc- 
tion that lays out the overall argument. Read in this way, the argument does not 
unfold sequentially as it would in a book. Instead, the digital argument is woven 
from threads drawn out of the narrative and intertwined until they form a fabric 
whose pattern shows the complex mix that characterized racial violence in Har- 
lem in 1935. 
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of Open Source Network Analysis Tools 


Abstract: As a contribution to a critical hermeneutics of data visualization, this 
chapter presents a critical examination of the default aesthetics used in three 
open-source network analysis software packages: Pajek, Cytoscape and Gephi. 
The aesthetics of network graphs are produced by the selection of the algorithm, 
algorithm-specific parameters, the visualization of statistical measures and the el- 
ements of line, shape and color that are applied. In this study, visualizations of 
the character interaction network from Victor Hugo’s novel Les Misérables are 
generated using the default parameters available in each software package, in 
order to explore the assumptions embedded in the tool and contribute to a hu- 
manist analysis of network analysis software and its outputs. Because network 
visualizations assist in understanding data structures at a variety of scales, criti- 
cal awareness of how network graphs are produced contributes to a more situ- 
ated, self-reflective data visualization practice that recognizes how aesthetics are 
always creating meaning. 


Keywords: network analysis, data visualization, aesthetics 


1 Introduction 


Today we live in a world marked by physical, technological, political and social net- 
works that operate at a scale beyond human comprehension. The rapid growth of 
the internet as both the dominant popular concept of a network and an important 
material, infrastructural network is but one example, and as Fredric Jameson sug- 
gests, contemporary technology is “mesmerizing and fascinating not so much in its 
own right but because it seems to offer some privileged representational shorthand 
for grasping a network of power and control even more difficult for our minds and 
imaginations to grasp: the whole new decentered global network of the third stage 
of capital itself” (Jameson 1991: 37). Within this “multinational and decentered” com- 
plexity, human perception struggles, according to Jameson, “to locate itself, to orga- 
nize its immediate surroundings perceptually, and cognitively to map its position” 
(Jameson 1991: 44). Drawing on theories of urban space and Althusser’s definition of 
ideology, Jameson theorizes cognitive mapping as an attempt to locate the subject in 
the “vaster and properly unrepresentable totality” of postmodern society, which he 
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describes as marked by “spatial as well as [. . .] social confusion" (Jameson 1991: 51, 
54). In the decades since Jameson first explored these ideas, their relevance has only 
become clearer. What he identified as the literature of *high tech paranoia" in spy 
novels and early cyberpunk fiction has become quite resonant with reality: “the cir- 
cuits and networks of some putative global computer hookup are narratively mobi- 
lized by labyrinthine conspiracies of autonomous but deadly interlocking and 
competing information agencies" (Jameson 1991: 38). Ruth Ahnert et al suggest that 
competing information organizations became visible in popular culture with the in- 
vestigation into global terrorist networks following 9/11 and the founding of Face- 
book in 2004, which brought once-obscure concepts and methods of network 
Science into the popular vocabulary. They identify *the network turn" at the begin- 
ning of the twenty-first century as *a whole host of converging thoughts and practi- 
ces around the turn of the new millennium — the zeitgeist of the networked age" 
(Ahnert et al. 2020: 3). Kieran Healy also points to the expansion of both material 
and symbolic networks during this period: *The rapid development of computing 
power, the infrastructure of the Internet, and the protocols of the World Wide Web, 
together transformed the capacity to construct, visualize, analyze and build net- 
Worked systems in practice. They were also accompanied by a major shift in the 
cultural salience of network imagery" (Healy 2015: 186). Network theory has been 
extensively operationalized in recent decades so as to substantially modify the world 
we inhabit and our everyday experience of it (Healy 2015: 195-198). Thus, as Jameson 
suggests, “cognitive mapping cannot (at least in our time) involve anything so easy 
as a map [. . .] mapping has ceased to be achievable by means of maps themselves" 
(Jameson 1991: 409). Instead we turn to networks, and particularly to network 
graphs, to understand our position, whether at a subway stop in an unfamiliar city 
or within a professional discourse community on social media (Derrible and Ken- 
nedy 2009: 17-25; Grandjean 2016). Network visualizations enable the viewer to per- 
ceive relationships at a variety of scales at the same time: we can holistically 
perceive the overall structures that connect an enormous set of data points and we 
can also zoom in on certain nodes, either with the assistance of technological tools 
or simply by focusing our attention on a portion of the graph. 

Not surprisingly, the visualization and analysis of network data have become 
an important method for research in the digital humanities during recent decades. 
Projects like Six Degrees of Francis Bacon, Kindred Britain, and The Viral Texts 
Project have demonstrated how network visualizations can help researchers and 
students understand the scale, scope, and spread of historical relationships among 
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people and texts.’ Since the early 2000s, the availability of large-scale cultural data 
has increased with the development of mass digitization libraries like the Internet 
Archive, Google Books, and the Hathi Trust Research Library. Data visualization of 
all kinds has become more widely used in humanities scholarship as well as in 
journalism. Digital humanities training courses and workshops often focus on the 
use of software tools for data visualization and analysis, including network analysis 
tools like Gephi and NodeXL. These developments have encouraged increasing 
numbers of humanities researchers to use network visualizations for exploratory 
analysis of many different kinds of large-scale data, such as contemporary or his- 
torical social networks; relationships among characters in books, television or film; 
and economic or political relationships among individuals, organizations and insti- 
tutions. However, the implications of the aesthetic choices produced by network 
visualization tools are not yet widely understood. 

Traditional scientific approaches to data visualization focus on accuracy and 
clarity, assuming that graphs and charts can represent a set of data and therefore 
the phenomena to which it refers (Tufte 1983). However, as Johanna Drucker sug- 
gests, both the data itself and the methods for its visualization need to be exam- 
ined through *humanistic inquiry," which *acknowledges the situated, partial, 
and constitutive character of knowledge production, the recognition that knowl- 
edge is constructed, taken, not simply given as a natural representation of preex- 
isting fact" (Drucker 2011: para. 3). There are no natural representations: the 
recognition, presentation, and shaping of knowledge into information is an ideo- 
logical act that reflects many interpretive choices on the part of its creators that 
are too often hidden from view to end users. As Lev Manovich says, *data does 
not just exist — it has to be generated" (Manovich 2001: 224). 

Interpretive choices are also embedded in the tools used to create visualiza- 
tions. As Ahnert et al point out, *we do need to be aware of the assumptions en- 
coded in the tools we use so that we can bend them to our own needs" (Ahnert 
et al 2020: 64). This is especially relevant for interdisciplinary researchers who 
often use methodologies and tools from other disciplines and apply them to hu- 
manities contexts. Data visualization tools are often designed for specific pur- 
poses and research communities and their affordances reflect the values and 
activities of those communities. 

Current network analysis tools make it possible for researchers to visually 
explore, filter and manipulate data at a variety of scales. The results of those op- 


1 “Six Degrees of Francis Bacon,” accessed December 15, 2021, http://www.sixdegreesoffrancisba 
con.com; “Kindred Britain," accessed December 15, 2021, https://kindred.stanford.edu/; and “The 
Viral Texts Project," accessed December 15, 2021, https://viraltexts.org/. 
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erations can be output in structured data tables as well as in graphical images. 
Because network visualization software provides for the creation and comparison 
of multiple views of the data, it can support humanistic research, which *consists 
not of converging toward a single interpretation that cannot be challenged but 
rather of examining the objects of study from as many reasonable and original 
perspectives as possible to develop convincing interpretations" (Sinclair et al. 
2013: para. 1). However, the outputs from such software should be presented 
within a critical framework that also reflects on the creation of the data, the lay- 
out algorithm and specific choices made within the software. Ignoring these ele- 
ments leads Alexander Galloway to claim that the visual similarities among *four 
different maps of the internet, produced by different methods and sources, se- 
lected from numerous examples available via a normal web search" mean that 
*only one visualization has ever been made of an information network, for there 
can be only one" (Galloway 2011: 89—90). By presenting these graphs at very small 
scale and without citation to their source, Galloway limits deeper inquiry into the 
techniques and technology of their creation. However, it appears as though at 
least three of the four were generated with the same layout algorithm and with 
community detection applied as colors to the nodes and edges. Galloway ignores 
these visualization choices and describes these networks as endlessly repetitive 
because he sees them as part of a “positivistic dominant of reductive, systemic 
efficiency and expediency" (Galloway 2011: 100). Yet as Drucker suggests, critically 
examining the methods that produce data visualization can help to dismantle its 
ideological mystification: *The apparently neutral declarative statements of inter- 
face and data display share an ideological agenda to simply appear to be what is. 
Taking apart the pseudo-transparency by showing the workings and apparatus of 
the interface and graphical display of data is a crucial act of hermeneutics applied 
to information displays and systems" (Drucker 2020: 132). 

As a contribution to this critical hermeneutics of data visualization, this chap- 
ter presents a critical examination of the default aesthetics used in three open- 
source network analysis software packages: Pajek, Cytoscape and Gephi. Many 
different choices affect the aesthetics of network graphs, including the selection 
of the algorithm, algorithm-specific parameters, the visualization of statistical 
measures, and the elements of line, shape and color that are applied. In network 
analysis software tools, some of these choices are implemented by default and 
others are available for researchers to choose from. In this study, visualizations 
of the classic Les Misérables character interaction network are generated using 
the default parameters available in each software package, in order to explore 
the assumptions embedded in the tool and contribute to a humanist analysis of 
network analysis software and its outputs. 
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2 Contexts 
2.1 Tool criticism 


Along with Drucker and Ahnert, a number of scholars have called for the critical 
examination of digital tools. Karin van Es suggests that tool criticism “reflects on 
how the tool (e.g., its data source, working mechanisms, anticipated use, interface, 
and embedded assumptions) affects the user, the research process and output, and 
its reliance on the user’s training” (van Es et al. 2021: 52). Tools must be understood 
in the context of how they are being used, especially as they are adapted by re- 
searchers in the humanities. The incredible capabilities of today's software bring 
network visualization of large-scale data within reach of many users, but as Bern- 
hard Rieder and Theo Röhle suggest, this power can mean that users “happily pro- 
duce network diagrams without having acquired robust understanding of the 
concepts and techniques the software mobilizes” (Rieder and Röhle 2017: 118). Scott 
Weingart offers a simple example: although a user can load a bimodal network 
(containing nodes that represent two different types of entities, like authors and 
books) in Gephi and run the centrality calculation on it, the implementation of 
node centrality that is built into the tool is designed only for unimodal networks. 
Thus “although the network loads into Gephi perfectly fine, and although the cen- 
trality algorithm runs smoothly, the resulting numbers do not mean what they usu- 
ally mean” (Weingart 2011, emphasis in original). Understanding how a given tool 
works and thinking critically about it can assist in working with “humanistic data,” 
which Weingart characterizes as “uncertain, open to interpretation, flexible, and 
not easily definable” (Weingart 2011). Marijn Koolen argues that even ifresearchers 
lack the mathematical knowledge to critically review the algorithms behind the 
tools they use, “tool criticism should analyze and discuss tools at the level of data 
transformations [. . .] how inputs and outputs differ and what this means for inter- 
preting the transformed data" (Koolen 2019: 382). This is particularly relevant for 
network visualization, which transforms a mathematical matrix into a spatial 
representation. 


2.2 Network visualization 


Network analysis is predicated on the assumption that understanding the connec- 
tions between entities leads to greater understanding of a dataset. Although the 
statistical analysis of networks originated with studies of physical systems, such 
as the roadways within a city or machines on a computer network, network anal- 
ysis was soon also applied to figurative connections between people as seen in 
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club memberships or correspondence networks (Wilson 1986; Zachary 1977). Visu- 
alizations of such networks represent entities as nodes, or vertices, and the con- 
nections between them as edges. Network graphs are thus a subset of the larger 
category of node-link diagrams, which use a common “set of abstractions [. . .] so 
close to ubiquitous that it can be called a visual grammar. Entities are almost al- 
ways shown using outline boxes, circles or small symbols. The connecting lines 
generally represent different kinds of relationships, transitions or communication 
paths between nodes” (Ware 2013: 222). A variety of statistical measures are used 
to describe network graphs, including node degree, edge weight, modularity and 
measures of centrality. These measures help identify important nodes and com- 
munities within a network and the ways that information, power or prestige flow 
between them. However, the visualization of a network “can allow users to see 
relationships, such as patterns and outliers, that would not be apparent through a 
metrics-based analysis alone” (Gibson et al. 2012: 325). This is especially important 
in visualizations of large-scale datasets that are difficult to comprehend in numer- 
ical terms. 

Since 1984, a number of different algorithms have been developed to gener- 
ate force-directed network graphs, which are widely used today especially for 
very large datasets. Force-directed graphs visualize the nodes in a network as if 
they were powered by spring or electrical forces of attraction and repulsion (Ko- 
bourov 2013). Nodes that are more closely interconnected are displayed closer to- 
gether, near the center of the graph and nodes that are less connected are 
dispersed outwards to its margins. It is important to recognize, however, that 
these spatial representations do not have a direct relationship to the data, as Ah- 
nert et al point out: *Networks express an internal logic of relationships between 
entities that is inherently intuitive. They also lack an explicit external spatial ref- 
erent, whether the latitude and longitude of cartography, the scale and sequence 
of a timeline, or the categories and measures that mark the x-y axis of a statistical 
graph" (Ahnert et al. 2020: 57). The meaning of a network's layout can only be 
understood within its own visual codes. 

Because most force-directed layouts draw the layout iteratively, starting from 
randomly selected nodes, the resulting graph will look somewhat different each 
time a user runs a specific algorithm, even with the same parameters selected. 
Tommaso Venturini suggests that the term “spatialization” is more appropriate 
than “visualization”, since *Force-directed layouts do not just project networks in 
space—they create a space that would not exist without them [. . .] In a force- 
spatialized visualization there are no axes and no coordinates, and yet the rela- 
tive positioning of nodes is significant" (Venturini et al. 2021: 3). Within this space, 
looking for indications of polarization, density, and clustering help the researcher 
to develop interpretations of the data (Venturini et al 2021: 4; Gibson et al. 2012: 
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345). However, users are likely to interpret node position as having more signifi- 
cance than it does: in a layout of a network containing three clusters, sometimes 
clusters 1 and 2 will be closer together, and sometimes clusters 1 and 3 will be 
closer (Gibson et al. 2012: 324). To counteract this mistaken perception, a key prac- 
tice of exploratory data visualization is to produce multiple drawings of a net- 
work in order to see which aspects of its structure persist; this process can be 
enhanced or constrained by the visualization's aesthetics. 


2.3 Meaningful aesthetics 


Within the context of data visualization, visual elements such as color, shape, 
symmetry and size contribute to the viewer's perception of both the graph's 
meaning and its overall beauty or aesthetic impact. Aesthetics are considered to 
be deeply related to the communication of meaning within graph drawing: *Cre- 
ating aesthetically appealing graphs is more than a quest for the beautiful — it has 
the practical aim of revealing underlying meaning and structure. In general, re- 
searchers associate aesthetics with readability, and readability with understand- 
ing" (Bennet et al. 2007: 57). Although contemporary neurobiology has offered 
different explanations of the underlying mechanisms of perception, the Gestalt 
principles of pattern perception continue to be relevant for understanding how 
we perceive visual designs. These principles include proximity, similarity, relative 
size and continuity, which explain that viewers are likely to perceive objects that 
are similar and/or near each other as constituting a group; to correlate differen- 
ces in size with differences in quantity or strength; and to perceive connection 
from continuous lines (Ware 2013: 181—186). Many guidelines and tools for data 
visualization incorporate these fundamental principles of perception. A recent in- 
vestigation into viewers' subjective rating of the beauty and interest evoked by 
randomly generated network graphs suggests that curved shapes are more likely 
to be perceived as beautiful, which corresponds with prior research in consumer 
design focused on material objects. More complex structures were rated as more 
interesting as visual stimuli. The researchers suggest that the combination of both 
beauty and interest are important in designing network graphs, as interest is re- 
quired to engage attention for longer periods of time (Carbon et al. 2018). 

The aesthetics of network graphs are produced by the combination of the net- 
work layout algorithm and the visual design choices selected in the visualization 
software. Within computer science, *aesthetically pleasing" layouts have been an 
explicit goal of force-directed algorithms since at least 1984, when Peter Eades 
specified that edges should be the same length and that the graph layout should 
be symmetrical (Kobourov 2013: 385). Later algorithms would add other criteria, 
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like the even distribution of nodes, minimizing edge crossings and node separa- 
tion and non-overlap (Gibson et al 2012: 22). These aesthetic criteria are thought 
to “ensure that a graph is displayed more effectively and allow the user to easily 
perceive the topological structure of a graph”, but as Helen Gibson points out, in 
practice these principles sometimes conflict with one another and with users’ 
subjective perceptions of the graphs (Gibson et al. 2012: 326-330) A variety of vi- 
sual design choices are available in network visualization tools, including node and 
edge color; node size and shape; and line styles and widths. These choices can be 
used to represent different features of the nodes and edges, whether present in the 
original data or calculated through statistical analysis of the network. These mean- 
ingful aesthetics help users gain insight about the relationship between node attrib- 
utes and the overall topology of the network (Gibson et al. 2012: 341). 

Even before a user interprets these visual cues in relation to the underlying 
data, they are likely impacted by the overall aesthetic appearance of any graph, as 
Helen Kennedy and Martin Engebretsen suggest: “Our encounters with form, col- 
our, and composition are informed by bodily experience as well as aesthetic judge- 
ment [...] Data visualizations thus create meanings through visual and other 
codes. But they also generate feelings, by which we mean the emotional responses 
that are connected to human encounters with data visualizations. Meanings and 
feelings are inseparable in our situated interactions with texts” (Kennedy and Enge- 
bretsen 2020: 23-24)? Recent art exhibits featuring network diagrams foreground 
these emotional responses to network aesthetics, but they should also be explored 
as part of humanist knowledge creation." 


3 Method 


This paper presents a study of network visualization tools and their outputs.? Be- 
cause understanding the design history of visualization tools and the creation of 
data used for analysis is integral to the humanist critique of the apparently objec- 
tive appearance of traditional data visualization, this section describes the tools 
and data used in this study. 


2 For examples of empirical user studies, see Purchase 2002 and Purchase et al. 2002. 

3 See Kennedy and Hill 2018 for a user study focused on these emotional responses. 

4 See, for example, The Art of Networks III, held at https://www.barabasilab.com/art/exhibitions. 
5 Anumber of comparative studies exist, but most focus on technical, rather than aesthetic com- 
parisons. See, for example Broasca et al. 2019; Combe et al. 2010; and Majeed et al. 2020. 
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3.1 Network analysis tools 


Three network analysis tools (Pajek, Cytoscape, and Gephi) were selected because 
they are cross-platform, open-access tools capable of visualizing very large net- 
works. These are all mature tools in continuous development, with active user 
communities and wide adoption by researchers. Additionally, these tools are con- 
sidered more comparable with each other from the user perspective because 
each provides a graphical user interface (GUI) and thus do not require knowledge 
of a programming language like Python or R. All of these tools will help users 
seeking to understand the relationships among entities in a dataset, but they will 
each do that work somewhat differently. 


3.1.1 Pajek 


Pajek was first released in January 1997 and remains in continued development to 
the present time. Pajek provides for a large number of analytic operations on six 
distinct data objects: network matrices or graphs; and five data objects that contain 
information about the nodes: “partitions” of nominal or ordinal properties; *vec- 
tors" of numerical properties; *clusters," or subsets from a partition; *permuta- 
tions" containing ranked properties; and *hierarchies" which represent nodes in a 
tree diagram. As its developers admit, *Pajek is not “a one click program’, some 
users call it the network calculator. That means that for obtaining some result sev- 
eral basic operations must be executed in a sequence" (Mrvar and Batagelj 2016: 2). 
Although this design means that Pajek has a steeper learning curve than some 
other programs, it is very powerful: the main program can handle networks of up 
to one billion nodes, and there are two versions designed for enhanced memory 
optimization for processing even larger networks. Version 5.15a of Pajek64 (for 64- 
bit Windows systems) released in May 2022 was used in this study. 


3.1.2 Cytoscape 


First released in July 2002, Cytoscape was specifically designed for “integrating bio- 
molecular interaction networks with high-throughput expression data and other 
molecular states into a unified conceptual framework" (Shannon et al. 2003: 2498). 
It provides for the integration of data from scientific databases of gene information 
and a number of analyses specific to biological research. Over time it has developed 
into *a general platform for complex network analysis and visualization" and the 
project website highlights uses of the software in the social sciences and general 
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study of networks. Many additional capabilities can be installed into the frame- 
work with apps developed by the Cytoscape team and the user community. Cyto- 
scape is designed for optimal rendering of networks with up to 100,000 nodes. 
Cytoscape version 3.9.1 released in January 2022 was used in this study. 


3.1.3 Gephi 


First released in July 2008, Gephi was designed as a "flexible, scalable and user- 
friendly software" that would provide “better network visualization to both experts 
and [an] uninitiated audience" (Bastian et al. 2009: 361). According to the Gephi doc- 
umentation site, *Gephi is a tool for people that have to explore and understand 
graphs. Like Photoshop but for graphs, the user interacts with the representation, 
[to] manipulate the structures, shapes and colors to reveal hidden properties”. 
This design for user interactivity includes displaying the network while the layout 
algorithm is running, direct manipulation of the shape of the graph and options for 
changing its aesthetics. Although it includes statistical analysis components, Gephi 
is explicitly described as a *a software for Exploratory Data Analysis" that can help 
users to hypothesize and “intuitively discover patterns" in networks.’ Along with 
continued developments to the primary software, a large number of optional plu- 
gins designed by the Gephi team and other users extend the capability of the plat- 
form. Gephi is capable of handling networks with up to 100,000 nodes and 1 million 
edges. Gephi version 0.9.5, released in May 2022, was used in this study. 


3.2 Les Misérables character interaction dataset 


Computer scientist Donald Knuth's 1994 book The Stanford GraphBase: A Platform 
for Combinatorial Computing included a number of datasets Knuth created from 
literary works (Knuth 1994: 12-14, 45-46, 180—191). One of these datasets lists 80 
characters from Victor Hugo's 1862 novel Les Misérables and their interactions in 
each chapter (Knuth 1994: 14). Although his dataset encompassed 80 characters, 
he documented interactions among only 77 of them, creating a network contain- 
ing 77 nodes and 254 edges.? This dataset was selected for this study in part be- 


6 “What is Cytoscape?,” accessed May 1, 2022, https://cytoscape.org/what is cytoscape.html. 

7 *Gephi documentation wiki,” accessed May 1, 2022, https://github.com/gephi/gephi/wiki. 

8 Ibid. 

9 The Stanford GraphBase data files are available from Skiena 2008. Knuth's personal website 
notes that he realized later that he omitted an interaction between Fantine and Cosette, but 
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cause it has been widely used in network analysis scholarship, beginning with 
Newman and Girvan’s community detection algorithm paper in 2004 and is freely 
available online in a number of network data repositories (Newman and Girvan 
2004). A graph of this network is also included as an example within Gephi's in- 
stallation files. 

However, it is important to note that the collection of data, particularly from 
cultural works like Hugo's novel, involves a number of decisions that are unfortu- 
nately not well documented for Knuth’s dataset: he says only that “the vertices of 
the graphs represent the characters that appear in well-known novels, and the 
edges represent encounters between those characters" (Knuth 1994: 12). For liter- 
ary scholars, neither characters nor encounters are self-evident: key questions in- 
clude how a fictional character is defined for the purposes of the research (i.e., do 
they have to be named within the novel to count) and how their interactions 
would be defined and documented (via direct and/or indirect speech, direct phys- 
ical actions, or simply being described as present in the same scene of the novel) 
(Moretti 2011). For example, Michal P. Ginsburg's dataset of character interactions 
in Les Misérables, which includes inferred encounters and unnamed characters, 
is much larger than Knuth’s, comprising 181 characters and 500 interactions.'? 
However, since the purpose of this study is to evaluate network visualization 
tools, Knuth's dataset was selected because it has become a standard reference 
point in network analysis scholarship. 


3.3 Approach 


As discussed above, the appearance of a network graph depends upon both the 
layout algorithm and aesthetic choices applied. Each of the network visualization 
tools discussed here implements several different force-directed layout algo- 
rithms, as noted in Table 1. Each of these tools also allow users to apply many 
different aesthetics to a network graph, such as node color, size and shape; line 
color, width and style; label shape, size and font; overall image zoom; and back- 
ground color. Within each of these aesthetics there are frequently at least ten dif- 
ferent options, and Gephi and Cytoscape provide for user selection of colors using 


notes that his original data files should be considered “forever-frozen examples of typical data 
that is more or less accurate." See Knuth, *The Stanford GraphBase," accessed May 7, 2022, 
https://www-cs-faculty.stanford.edu/~knuth/sgb.html. 

10 Michal P. Ginsburg, “Visualizing Les Misérables,” accessed May 7, 2022, http://lesmiserables. 
mla.hcommons.org/. 
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RGB or six-digit hexanumeric notation, which encompasses 16.7 million colors. 
(Pajek offers 96 defined colors.) Thus a comprehensive examination of all possible 
combinations is beyond the scope of this chapter. Instead, this chapter examines 
the assumptions and effects of the default settings provided in the tools, which 
are likely to influence many users, especially those with limited background 
knowledge or technical expertise in graph drawing. Beyond this direct influence, 
understanding the assumptions that are built into these tools contributes to the 
critical awareness of how software shapes interpretation. In this study, the Les 
Misérables dataset was imported into Pajek, Cytoscape and Gephi, and images 
were generated of the initial presentation of the data and the different force- 
directed layout options available in the tool, using the default display settings. 


Table 1: Graph Layout Implementation in Network Visualization Tools. 


Tool Force-Directed Layouts Other Layouts 
Pajek 5.15a Kamada-Kawai Circular 
Fruchterman-Reingold Pivot MDS 
VOS Mapping 
EigenValues 


Tile Components 


Cytoscape 3.9.1 


Edge-Weighted Force Directed Layout (Biolayout) 
Edge-Weighted Spring-Embedded Layout 
(Kamada-Kawai) 

Prefuse Force-Directed Layout 

Compound Spring-Embedder Layout 


Grid Layout 

Attribute Circle Layout 
Group Attributes Layout 
Circular Layout 
Hierarchical Layout 


Gephi 0.9.5 Force Atlas Circular Layout 
Force Atlas 2 Contraction 
Fruchterman Reingold Dual Circle Layout Expansion 
OpenOrd Label Adjust 
Yifan Hu Noverlap 
Yifan Hu Proportional Radial Axis Layout 
Random Layout 
Rotate 
4 Results 


As noted above, the Les Misérables character interaction network is of modest 
size, containing 77 nodes and 254 edges. As such, it is often used in studies of net- 
work layouts and is even included within Gephi as a sample file. It thus serves as 
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a good example for examining how the three tools initially visualize the topogra- 
phy of the network. This is a unimodal network, in which all nodes represent the 
same kind of entity, in this case characters in the novel. Because force-directed 
algorithms display the more highly-connected nodes towards the center of the di- 
agram, the distribution of node degree within a dataset strongly influences the 
visualization of the network. Within the Knuth dataset, the node degree, or the 
number of different nodes a given node is directly connected to, ranges from 1 to 
36, with Jean Valjean, the novel’s protagonist, interacting with the greatest num- 
ber of other characters (36). Other high degree nodes include the urchin Gavroche 
(22), the student Marius (19), the police inspector Javert (17) and the innkeeper 
Thenardier (16). Figure 1 displays a histogram of node degree in the dataset. The 
majority (78%) of the novel’s characters are connected to ten or fewer other char- 
acters, with 35% of the nodes in the dataset connected to only one or two others. 


count 


20 
node degree 


Figure 1: Node degree frequency in the Les Misérables dataset. 


Another fundamental measure of a network’s structure is edge weight, the count 
of how many times a specific connection is repeated; in this dataset, characters 
who repeatedly interact within the novel’s chapters have edges with higher 
weights than those who only interact once or twice. Most (87%) of the character 
interactions recorded by Knuth occur five or fewer times and 97 of the 254 (38%) 
interactions recorded in Knuth’s dataset occur only once. Notably, there are 14 
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characters with a node degree of one who interact with that other character only 
one time; six of these interact with Bishop Myriel and four with Valjean. With 
such low node degree, the Myriel cluster, which also includes a seventh character 
with node degree of only two, tends to be positioned near the edges of the graph 
in most visualizations. Within the novel, these upper-class characters exist in a 
very separate world than do the working- and middle-class characters. 

There are four pairs of characters who interact repeatedly throughout the 
novel and thus have very high edge weights: the protagonist Valjean and his 
adoptive daughter Cosette (31); Cosette and her suitor Marius (21); Valjean and 
Marius (19); and Valjean and his enemy Javert (17). These four characters are cen- 
tral to the novel’s plot and themes. As the novel’s protagonist, Valjean has the 
highest node degree, meaning he is connected to the greatest number of other 
characters (36), and he also appears in the novel the greatest number of times 
(158, or 62% of the interactions in the dataset). However, an important character 
does not necessarily have to interact with a lot of different characters: Cosette’s 
node degree is only 11, but she appears very frequently in the novel’s pages: 68, 
or 27% of the interactions in the dataset. In network visualization, edge weight is 
conventionally visualized through line thickness, but other aesthetics such as 
color or line type (i.e., solid, dotted or dashed) can also enable the viewer to visu- 
ally compare edges based on their weight, in order to understand the frequency 
of that interaction in the data. Node degree and edge weight are fundamental 
measurements that greatly impact the visualization of the network. 


4.1 Pajek 


In Pajek, a user’s first encounter with a set of network data is through a shaped, 
defined form because the software automatically applies a circular network 
graph layout to the data in the Draw window. The low resolution Draw window 
generates the working view of the graph along with any aesthetics that have been 
applied, either as part of an analysis or as a manual selection. Figure 2 shows the 
initial visualization of the Les Miserables character interaction network in Pajek’s 
Draw window, before any layout algorithm has been applied by the user. (This 
view can also be generated under Draw/Layout/Circular/Original.) Although this 
layout is labeled a circular layout within the software, the shape of the graph de- 
pends on the window used for the Draw view and on a typical computer screen it 
tends to be more elliptical in shape rather than an exact circle. 

In this default working view of the network, all nodes in the graph are repre- 
sented by equally-sized yellow circles which are evenly distributed in an elliptical 
layout against a tan background with dark red node labels. Even in a relatively 
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Figure 2: Pajek: initial view of the network in the low resolution Draw window. 


small network such as this one, many of the node labels are illegible because they 
overlap with nodes and edges. The display of node labels supports exploratory un- 
derstanding of network information, but they can interfere with the perception of 
the overall topology of the network. Although the display of node labels is a default 
feature of graph drawing in Pajek, labels have been removed in the remaining 
graph images so as to facilitate comparisons among the different layouts and tools. 

The colors, shapes and line styles used by default in the low resolution Draw 
window emphasize the nodes more than the edges and the tan background mini- 
mizes contrast between the visual elements. Higher resolution Encapsulated Post- 
Script (EPS) images can also be exported from Pajek, with the colors for nodes and 
edges set from a defined list of 96 named colors in the Export/Options screen. An 
exported EPS image thus uses different aesthetics than what is displayed in the 
working view of the graph. Figure 3 displays the initial view of the data with the 
default color settings used for EPS file export. Here the darker blue lines used for 
edges shift the visual balance a bit away from the nodes, which are displayed in a 
light salmon color. These EPS file settings are used in the remaining images gener- 
ated from Pajek. 

Although the Pajek reference manual simply states that nodes in this initial 
layout are positioned by default *in order determined by the network," examina- 
tion of the graph along with the node weights reveals that this layout is designed 
to reveal clusters of connected nodes (Mrvar and Batagelj 2022: 69). High degree 
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Figure 3: Pajek: initial view of the network with default color settings for EPS file export. 


nodes (those with a lot of connections), such as the one representing the novel’s 
protagonist Valjean, are positioned so that the edges connecting them to low de- 
gree nodes cross the diameter of the ellipse. By default in Pajek, all edges are dis- 
played as equal width, drawn with thin blue lines, rather than visually indicating 
edge weight. These visual choices place more emphasis on node degree, or the 
number of edges connected to a given node, rather than on significant relation- 
ships between node pairs. 

Because this layout is the first view presented of the data in Pajek, the soft- 
ware promotes the assumption that the connections to high degree nodes are the 
most meaningful. Although this is a standard approach to analyzing networks, ex- 
amining the data in other visualizations can highlight other ways of seeing the 
nodes in the network. Figure 4 shows the same elliptical (“circular”) layout with 
random positioning of the nodes. As discussed above, low-degree nodes are pre- 
dominant in this dataset, which points to Hugo’s representation of class, gender 
and professional distinctions within an urban setting. Viewing the network with 
random positioning of nodes shifts the focus from the plotting around the novel’s 
central characters to the socio-historical view offered in the novel. 

Pajek includes two classic force-directed algorithms in its layout options, which 
it calls “Energy” layouts: Kamada-Kawai (Kamada and Kawai 1989) and Fruchter- 
man-Reingold (Fruchterman and Reingold 1991). The default setting for these lay- 
outs selects a random node as the starting point for drawing the graph, although a 
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Figure 4: Pajek: elliptical (circular) graph layout with random node ordering. 


specific starting point may be selected by the user. Pajek offers several layout pa- 
rameters for the Kamada-Kawai algorithm, which is based on spring forces, includ- 
ing options to fix the position of specific nodes, to optimize subclusters or to 
optimize components (Mrvar and Bagatelj 2022: 69). This layout tends to emphasize 
symmetry and produces a boxy effect in the display of node clusters (Gibson et al. 
2012: 330). Figure 5 shows the Kamada-Kawai layout with separated components, 
which reduces visual overlap of the nodes (Kobourov 2013: 383). This layout visually 
distinguishes four groups among the more connected nodes and the default aes- 
thetics applied in Pajek represent higher edge weights with thicker lines. Because 
of the strong attractive force applied in the visualization, it is difficult to distinguish 
the individual nodes for Valjean, Marius, Javert and other key characters at the 
center of the graph. As in all force-directed layouts, this algorithm places highly- 
connected nodes towards the center of the graph and the very low degree nodes 
are clearly visible around the perimeter. Because these nodes are evenly arranged 
at some distance from the center of the graph, they are strongly distinguished from 
the more highly connected nodes, but it is difficult to perceive how they relate to 
the network as a whole. For example, the group of seven very low-degree nodes 
connected to Myriel are located in the upper right quadrant of the graph. Because 
they are spread out so far from each other and their edges cross others before con- 
necting with the Myriel node, these nodes are not visually distinct as a group. 
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Figure 5: Pajek: Kamada-Kawai layout with separated components. 


Figure 6 displays the Les Misérables character network in Pajek using Fruchter- 
man and Reingold’s algorithm, which is based on the attractive and repulsive 
forces exerted between “atomic particles or celestial bodies” combined with an- 
nealing or cooling processes (Kobourov 2013: 385). This layout spreads out both 
the highly weighted and less weighted nodes to form a rounded shape to the 
graph overall. Pajek's implementation of the algorithm tends to draw edges 
closely alongside each other, creating an elongated visual effect. The intricacy of 
connections is minimized in this layout in favor of displaying node groups, which 
is helpful for understanding the structures within Hugo’s Les Misérables: the high 
degree nodes for the key characters Valjean, Javert, Marius and Cosette are lo- 
cated at the very center of the graph; the nodes for the political revolutionaries 
connected to Courfeyrac and Enjolas are arranged in an overlapping wedge in 
the upper left quadrant; and the cluster of upper-class characters connected to 
Bishop Myriel is clearly visible on the right side of the graph. In this layout even 
small differences in node degree are visually distinguished because of the way 
the algorithm calculates the attractive force at the graph’s center. Although node 
placement does not have inherent meaning, the placement of the nodes relative 
to one another and to the center of the graph is meaningful. 
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Figure 6: Pajek: Fruchterman-Reingold layout. 


4.2 Cytoscape 


Cytoscape provides four force-directed layouts in the main program: the Prefuse 
Force-Directed Layout; the Edge-Weighted Force Directed Layout based on the Biol- 
ayout algorithm, which is specifically designed for similarity analysis in biological 
research and does not work well for other kinds of network data; the Edge- 
Weighted Spring-Embedded Layout, which is an implementation of the Kamada- 
Kawai algorithm; and the Compound Spring-Embedder Layout, which is optimized 
for use with compound graphs as well as other networks (Heer et al. 2005; Enright 
and Ouzounis 2001; Kamada and Kawai 1989; Dogrusoz et al. 2009). Cytoscape man- 
ages aesthetics for edge and node style, color, shape and size, along with the data 
features that these aesthetics represent, through a palette of 18 predefined “styles.” 
Users can also modify these existing styles or create their own. 

The current version of Cytoscape initially displays network data using the 
Prefuse force-directed layout, although the software documentation states that 
the Grid Layout is the default view of the data. As shown in Figure 7, this view of 
the data is initially displayed in the “default” palette style, which represents 
nodes with light blue rectangles containing the node label in black and edges as 
thin grey lines on a white background. As with the default aesthetics applied in 
Pajek, the emphasis here is on the nodes, more than the edges: the large rectangu- 
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lar nodes provide information but also obscure some of the edge intersections, 
making it difficult to perceive the relationships between the nodes. 
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Figure 7: Cytoscape: initial view of the network (Prefuse layout with default style). 


Figure 8 shows the same Prefuse layout with the *default black" aesthetic style 
applied and node labels removed. This style represents the nodes with small 
white circles and edges with thin green lines on a black background, which pro- 
vides a better topological understanding of the network than the “default” style. 
Both of these styles display node labels by default and do not visually represent 
edge weights. These aesthetics are user-modifiable, but the default settings reflect 
the overall focus in Cytoscape on node information, which derives from its origi- 
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nal development for biological research. Nevertheless, the layout algorithms in 
the software provide for the exploration of network structures as well. Nodes are 
evenly spaced in the Prefuse layout and clusters are arranged in a circular pat- 
tern that clearly shows the interactions among the nodes. Even the many edges 
connecting the high-degree nodes at the graph's center are visually distinct in this 
figure. To facilitate comparisons of the layouts and tools, the “default black” style 
is used in the remaining images generated from Cytoscape. 


Figure 8: Cytoscape: Prefuse layout with default black style. 


Figure 9 shows the Les Misérables network in Cytoscape’s Edge-Weighted Spring 
Embedded Layout, which is an implementation of the Kamada-Kawai algorithm. 
Despite the layout's title and the fact that it includes a weighting parameter, the 
default aesthetic styles do not visualize the edge weights unless that option is 
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selected by the user. Comparing Cytoscape's implementation of Kamada-Kawai 
with that in Pajek (Figure 5) shows overall topological similarities, as would be ex- 
pected, but individual nodes and groups within the network are more clearly sepa- 
rated. The high-degree nodes for Valjean, Javert, Marius and Cosette near the 
center of the graph can be clearly distinguished from one another; the two groups 
of political revolutionaries are shown as overlapping but distinct wedge shapes on 
the left; and the Myriel group is placed in the upper right quadrant of the graph. 


Figure 9: Cytoscape: Edge-Weighted Spring-Embedded layout (Kamada-Kawai). 


Figure 10 shows the Les Misérables network in the Compound Spring-Embedder 
layout in Cytoscape, which evenly spaces the nodes in the graph and thus tends to 
arrange nodes in a circular fashion. Clusters of nodes are less clearly distin- 
guished from one another and the node spacing makes the edges in the graph 
more visible. This layout is thus useful for exploring the paths of connection 
among individual nodes and clusters. As noted previously, the fact that the Myriel 
cluster of nodes appears on the left side of this graph rather than on the right, as 
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it did in other figures, bears no significance. What is significant, and represented 
in each algorithm and tool, is that the characters connected to Myriel are strongly 
distinguished from those connected to the revolutionaries Courfeyrac and Enjol- 
ras, as shown in the annotations in Figure 11. 
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Figure 11: Cytoscape: Compound Spring-Embedder layout with select nodes annotated. 
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4.3 Gephi 


The Gephi interface is divided into three main windows: the Data Laboratory, 
which displays the node and edge data tables; the Graph window, which allows 
for analysis and manipulation of the network; and the Preview window, in which 
the image is fine-tuned for exporting to an image file. Aesthetic choices, such as 
color and size for nodes and edges, can be applied in both the Graph and Preview 
windows. Figure 12 displays the initial view of the Les Misérables network in the 
Graph window, before any layout algorithm has been applied. Nodes are dis- 
played as small black circles and edge weights are visually indicated with propor- 
tional edge widths. Of the three tools surveyed here, Gephi is the only one that 
privileges a random view of the dataset by making it the default view of a new 
network. But this random view is not without an implied interpretive approach: 
by displaying edge weights in this initial view of the data, certain connections are 
visually distinguished from the rest, such as the frequent interactions between 
Valjean and Cosette. Figure 13 displays the same initial view of the data, before 
any layout or aesthetics have been applied, in the Preview window. Even though 
networks are displayed with straight edges in the Graph window, the default set- 
ting in the Preview window is to display curved edges. In the figures that follow, 
straight edges are used to facilitate comparison with earlier figures. 

Gephi provides four force-directed layouts in the main program: Fruchterman 
Reingold, Yifan Hu, Force Atlas and OpenOrd (Fruchterman and Reingold 1991; Hu 
2006; Jacomy et al. 2014; Martin et al. 2011). As noted previously, the Fruchterman- 
Reingold algorithm is based on the forces between celestial bodies and the algo- 
rithm was designed to produce “even node distribution, few edge crossings, uni- 
form edge length, symmetry and fitting the drawing to the frame" (Gibson et al. 
2012: 331). Fruchterman-Reingold layouts tend to produce an overall rounded shape 
with evenly spaced nodes throughout the graph (Gibson et al. 2012: 332). Figure 14 
displays the Les Misérables network using the default settings for speed and gravity 
in Gephi's Fruchterman-Reingold implementation. Because Gephi visually encodes 
edge weight by default, this graph emphasizes strong connections between particu- 
lar nodes, rather than node degree. Key groups of nodes are visible in the graph, 
but are more difficult to distinguish than in some other layouts because of the way 
low-degree nodes are arranged around the perimeter with long edges connecting 
them to nodes in the center of the graph. As shown in Figure 15, the Myriel group is 
on the left side of the graph, but is difficult to discern because it overlaps with the 
group connected to Fantine. The Gestalt principles of similarity and figure-ground 
patterns explain why the Myriel group, with some of the lowest edge weights in the 
network, appears as though it lies behind the Fantine group: darker lines are per- 
ceived as though they are in the foreground and thus the Fantine group, with edge 
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Figure 12: Gephi: initial view of the data in the Graph window. 


weights ranging from 9-15, is easier to see (Ware 2013: 189-191). The uniform node 
separation in this layout creates a strongly rounded shape to the graph as a whole 
with triangular shapes between connected nodes. This layout thus emphasizes rela- 
tionships between individual nodes, rather than the structure of groups or clusters 
within the graph. 

The Yifan Hu layout algorithm uses the attraction and repulsion forces pro- 
duced by pairs or groups of nodes to first produce a version of the graph at a 
coarse resolution. By iteratively filling in the rest of the graph structure, this algo- 
rithm reduces the power required to process very large graphs (Gibson et al. 
2012: 336-337). The current version of Gephi offers both the initial Yifan Hu algo- 
rithm and the Yifan Hu Proportional layout, which adds more distance between 
central and outer nodes (Cherven 2015: 76). Figure 16 shows the Les Miserables 
network in the Yifan Hu layout with default settings applied. Compared to Fruch- 
terman-Reingold, the Yifan Hu layout visually separates groups of nodes and dis- 
tinguishes between tightly interconnected clusters and groups of nodes that fan 
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Figure 13: Gephi: initial view of the data in the Preview window. 


out from a single shared connection. The Myriel group is clearly visible at the bot- 
tom of the graph. However, Gephi’s default coloring of black nodes and edges, 
along with the scaling used to represent edge weights, can make it difficult to see 
all of the nodes in a network, especially with heavily weighted connections such 
as those between Valjean, Javert and Marius, shown here with very thick edge 
lines. 

The OpenOrd algorithm, based on simulated annealing processes, is another 
multi-level approach designed for showing global structure in very large networks; 
however, with a network like this one with under 100 nodes, it can be less effective 
than other force-directed layouts (Martin et al. 2011; Gibson et al. 2012: 338). By de- 
sign, OpenOrd produces longer edges in a graph than Fruchterman-Reingold to 
help distinguish clusters within the network’s structure. OpenOrd offers a number 
of user-customizable parameters including edge cutting, which affects edge lengths 
in the graph. Figure 17 shows the Les Misérables network in OpenOrd with the 
Gephi default parameter settings and default aesthetics. Figure 18 shows the same 
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Figure 14: Gephi: Fruchterman-Reingold layout. 


layout colored by modularity, which highlights four main clusters in the network: 
the cluster of Valjean and other high-degree main characters in the center of the 
graph; the revolutionary cluster at the lower right, which because of the edge 
width scaling looks almost as prominent; the group of seamstresses connected to 
Fantine at the upper right and the Myriel group at lower left. This layout empha- 
sizes the highly weighted connections across and within clusters. 

The ForceAtlas algorithm was designed by the Gephi team with the release of 
the software in 2009 to make possible real-time continuous network visualization 
with user-customizable parameters for attraction, repulsion and gravity (Bastian 
et al. 2009: 361). Since the 2014 release of ForceAtlas2, ForceAtlas is considered 
obsolete, but is still included in the program (Jacomy et al. 2014: 1—2). ForceAtlas2 
is a continuous algorithm: “As long as it runs, the nodes repulse and the edges 
attract. This push for simplicity comes from a need for transparency. Social scien- 
tists cannot use black boxes, because any processing has to be evaluated in the 
perspective of the methodology” (Jacomy et al. 2014: 2). By design, Gephi users 
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Figure 15: Gephi: Fruchterman-Reingold layout, detail of Myriel and Fantine groups. 


can experiment to see the effects of changing the algorithm's parameters. Fig- 
ure 19 shows the Les Misérables network in the ForceAtlas2 layout with default 
settings applied. Because the algorithm tends to overlap the nodes unless the de- 
fault parameters are changed, the node clusters are difficult to distinguish in Ge- 
phi’s default aesthetic, which for this algorithm displays the nodes at a large size 
relative to the edges. Figure 20 shows the same graph with grey, rather than 
black, nodes and edges, which provides a better view of the network’s structure. 
Like Yifan Hu, ForceAtlas2 separates smaller node groups from the central highly 
connected nodes of the network. 

As noted above, Gephi's default aesthetics scale the width of the edge lines 
according to edge weight, and color both nodes and edges black on a white back- 
ground. As with Pajek and Cytoscape, Gephi makes it possible to change the aes- 
thetics of node shape, size and color according to features of the data, such as 
node degree, statistical measures of centrality, modularity or categorical parti- 
tions in the data. However, Gephi also provides an interactive interface that al- 
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Figure 16: Gephi: Yifan Hu layout. 


lows a user to click directly on nodes to move them or to apply specific aesthetics 
independent of the data features. Edge color, width and style can also be changed 
to reflect features of the data or other aesthetic preferences. Aesthetic changes 
are immediately visible in the Graph window and contribute to the exploratory 
environment for which Gephi is designed. Additional aesthetics for node and 
edge label styles, node opacity and borders and line shape are applied in the Pre- 
view window. With so many aesthetic possibilities, nearly infinite aesthetic trans- 
formations are possible for any given network graph, even without adjusting the 
layout. 
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Figure 17: Gephi: OpenOrd layout. 


5 Discussion 


Because network visualization relies on aesthetic properties to communicate in- 
terpretations of data, understanding how different tools implement key layout al- 
gorithms and aesthetic styles is important, not only in terms of selecting the best 
tool for a particular project, but also for critically analyzing the products of these 
tools. As suggested here, the aesthetic dimension of network graphs produces 
meaning far beyond their explicit data mappings. Symmetry, contour, shape and 
color influence our attention and perception and can serve to highlight or ob- 
scure certain aspects of the network. Any network visualization is the product of 
numerous decisions which are too often not only undocumented, but even delib- 
erately hidden from view. This section examines the most widely circulated visu- 
alization of the Knuth Les Misérables network in order to reveal the technological 
and aesthetic mystification it embodies. 
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Figure 18: Gephi: OpenOrd layout with modularity. 


When a user first opens the Gephi program, a welcome screen offers the user 
shortcuts to open recent files, to start a new project, or to open one of three net- 
work files that are included with the basic installation. These sample files offer 
insight into some of the assumptions and expectations that were built into Gephi's 
design. When a user opens the included file of Knuth's Les Misérables network, it 
appears in the program's Graph window not as unformatted, randomly ordered 
nodes, but with a layout algorithm and numerous aesthetic features already ap- 
plied, as shown in Figure 21. These aesthetics are presumably intended to do dou- 
ble duty, serving not only to elucidate the character interactions that structure 
Hugo’s novel, but also to exemplify good network visualization practices. 

This network graph is built into the visualization software itself and presented 
without comment or explanation, as if its design makes it fully self-explanatory. 
However, like all network visualizations, this graph is a construction, so examining 
how it was made is important. It demonstrates several conventional practices in 
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Figure 19: Gephi: ForceAtlas2 layout. 
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Figure 20: Gephi: ForceAtlas2 layout with grey color applied. 


network visualization: nodes are sized according to node degree, edge widths 
are sized according to edge weight, and nodes are colored by modularity group- 
ings. In addition, this graph embeds a variety of assumptions in its aesthetics. 
Bright, appealing colors and large node sizes simplify the appearance of the 
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Figure 21: Les Misérables network graph included with Gephi installation. 


graph. In this visualization, Hugo’s multi-character saga is simplified into a 
story focused on the large red node representing Valjean. The complexity of the 
characters’ connections are visually minimized by using very fine edge lines rel- 
ative to the node sizes, which modify one’s perception of the scale of the full 
dataset. 

Not only is this graph not produced with the default settings in the software, 
but most of its aesthetics appear to have been manually created to produce cer- 
tain desired effects. Gephi provides numerous ways for users to alter the aes- 
thetics applied to a graph in order to reveal or enhance certain interpretations of 
the network; unless these adjustments are documented, which the software itself 
does not do, viewers may not be able to recognize their effect on the final graph. 
For example, this visualization uses a bright palette that is not among the stan- 
dard color sets included with the program; to use this palette, each color would 
have to be individually specified with RGB or hexadecimal color values. Not in- 
cluding this example's color palette in the software almost seems like a deliberate 
attempt to frustrate users who might wish to emulate it. 

An attempt at recreating the Gephi sample visualization using the ForceAtlas2 
layout algorithm is shown in Figure 22. The process of recreating that visualiza- 
tion reveals several key aesthetic manipulations that were used to produce the 
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sample visualization. Several experiments with this layout and with Yifan Hu, 
which ForceAtlas2 most closely resembles, suggests that the length and placement 
of the edges towards the periphery of the network may have been manually ad- 
justed to create tighter clusters. In addition, the node sizes have been adjusted to 
produce the dramatic size differences shown in this sample visualization, rather 
than using a mathematically scaled approach to representing the node degree 
range in the network. These adjustments significantly alter the appearance of the 
graph and the way it guides interpretation of the novel's data. 


Figure 22: Reconstruction of the Gephi sample network graph. 


As shown in Figure 21, nodes in the sample visualization are colored according to 
modularity groups. Gephi's modularity analysis tool implements the Louvain 
method for community detection, which discovers small communities and then 
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iteratively groups them into larger ones within the network (Blondel et al. 2008). 
When this modularity analysis is run on the Les Misérables network using the de- 
fault settings, the program produces six modularity classes rather than the nine 
that are shown in Figure 21, suggesting some manual adjustments were made to 
produce that sample visualization. These adjustments have important effects on 
the visualization, because the modularity group colors and node sizings used in 
the Gephi sample visualization sharply distinguish Valjean from other main char- 
acters in the novel, including Javert, Thenardier, Marius and Fantine, as shown in 
the detail in Figure 23. In the sample file, the Valjean cluster consists only of Val- 
jean and a few minor characters with low node degree, and the other high degree 
characters are clearly separated into other modularity groups. 


Figure 23: Detail of high degree nodes in the Gephi sample graph shown in Figure 21. 


While recreating the sample visualization for Figure 22, multiple experiments 
were made with adjusting the modularity resolution setting, which affects how 
many groups are produced. However, Valjean was always grouped with one or 
more of the high degree characters he is highly connected to. For example, in the 
reconstruction shown in Figure 22, Valjean is grouped with his antagonist, police 
inspector Javert, as shown in the detail in Figure 24. These experiments, like the 
other visualizations presented in this chapter, suggest that Valjean's interconnec- 
tions with a wide range of characters may be of equal or greater significance in 
the novel. By setting this graph as an example to learn from, but not providing a 
full explanation of how it was created, Gephi’s designers set unrealistic expecta- 
tions for simplicity and clarity in network visualizations. Even more problematic 
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Figure 24: Detail of high degree nodes in the reconstruction of the Gephi sample graph shown in 
Figure 22. 


is the fact that numerous aesthetics in this image were manipulated to create cer- 
tain effects that shape viewers' perceptions of the underlying data. 

This critical account of the default settings in three open source network visu- 
alization tools network reveals how visualization aesthetics encode meaning both 
explicitly and implicitly. Each of these tools offers numerous ways that the user 
can customize the appearance of the graph so as to promote interpretations of 
the data. Learning more than one tool allows the researcher to select the software 
that will be most appropriate for a given project. Although it works very well on 
large graphs, Pajek’s visualization capabilities are somewhat less flexible than the 
other tools, due to the limited color selection, low resolution images in the Draw 
screen and differences between the working view and the exported image. The 
Cytoscape style palettes offer simple one-click adjustments to the appearance of 
the graph that alter multiple aesthetics at once, but most of these are designed for 
specific scientific purposes. Although the Gephi sample visualization demon- 
strates some of the rich aesthetic possibilities in the tool, its default aesthetics are 
very spare and learning to manipulate the many different aesthetic controls can 
take some time. No single tool or algorithm is necessarily better than the others; 
rather, understanding the differences among them and exploring a given network 
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in multiple layouts and in different tools can provide new insights. Because net- 
work visualizations assist in understanding data structures at a variety of scales, 
critical awareness of how network graphs are produced contributes to a more sit- 
uated, self-reflective data visualization practice that recognizes how aesthetics 
are always creating meaning. 
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Abstract: In this article, we investigate the epistemological dimensions pertaining 
to the notion of scale in Digital Humanities (DH). We first echo the growing con- 
cerns of digital tools makers who call for separating the notion of complexity 
from the metaphor of scale frequently used in the DH literature. We then harvest 
a corpus of 825 DH papers related to the notion of scale and we build a semantic 
map on top of them to highlight the various ways DH scholars make use of scale 
in their contributions. By reviewing this map, we show that scale acts as a blurry 
concept in DH literature along with level used as a sibling. We then argue for dis- 
tinguishing between level and scale to reconstruct and visualize the complexity of 
the social sphere. We redefine level and scale as operators of a socio-technical al- 
gebra. This algebra aims at increasing our collective capacity to mine digital 
traces. We then give practical examples of the joint use of level and scale with the 
phylomemy reconstruction process. We finally introduce GarganText a free text 
mining software heir of de Rosnay's Macroscope. We explain how GarganText 
embeds our definitions of level and scale considering mathematical foundations, 
programming technologies and design principles. 


Keywords: level, scale, phylomemy, semantic map, macroscope 


1 Introduction 


The metaphor of scale has spread widely in the literature and daily vocabulary of 
Digital Humanities (DH). As scholars in Social Sciences (SS) are more and more 
interested in the scientific potential of digital resources, they look for new termi- 
nologies to revisit classical concepts and define innovative approaches. However, 
the epistemological definition of scale remains blurry and each sub-domain of DH 
has its own interpretation. In the best-case scenario, it can either refer to the 
promises of digital data or to some exploration mechanisms implemented within 
software; otherwise, scale is simply used as a stylistic formula. Yet, other research 
domains - especially the field of Complex Systems (CS) — have deeply investigated 
the notions of scale and have also introduced the sibling concept of level. This vo- 
cabulary issue is of great importance: by clearly shaping the notions of level and 


8 Open Access. © 2024 the author(s), published by De Gruyter. LGS This work is licensed under the 
Creative Commons Attribution-NonCommercial 4.0 International License. 
https://doi.org/10.1515/9783111317779-015 
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scale we might reinforce the epistemological connections between Social and 
Computer Sciences. Indeed, we argue for the creation of a socio-technical algebra 
able to translate any research question of Social Sciences as a mathematical prop- 
osition interpretable by computer scientists and implementable in a software. Re- 
garding the current confusion around the metaphor of scale in DH, we have 
chosen to focus on both level and scale to turn them into the first two operators 
of our future socio-technical algebra. As part of the Zoomland effort, our paper 
will thus argue for re-defining level and scale (section 4) by using arguments from 
digital tool-makers (section 2) and by conducting a wide review of the DH litera- 
ture (section 3). We will then give practical examples of how level and scale can 
be used to reconstruct socio-historical processes from digital resources (section 5) 
and introduce GarganText a free text mining software that implements these two 
notions (section 6), which makes it possible to combine these operators for inves- 
tigating a research question. 


The evolution of social sciences 


Social Sciences have been deeply impacted by the 2000’s revolution of the Informa- 
tion and Communications Technologies (Borgman 2003). Over the past 20 years, the 
unprecedented flow of digital data produced by communication devices and elec- 
tronic networks has induced an epistemological shift among SS: as new research 
questions arose, fieldwork and experimental practices evolved to study the digital 
world. SS had to investigate the inner nature of digital data, explore the scientific 
capacity of such new materials and create dedicated tools (Edelmann 2020). Scholars 
then organized themselves and gave birth to the wide domain of Digital Humani- 
ties (Mounier 2012) and sibling fields: Digital Studies (Stiegler 2016), Digital Sociol- 
ogy (Boullier 2015), Digital History (Kemman 2021), Digital methods (Rogers 2013), 
Cultural Analytics (Manovitch 2016), etc. Within the scope of DH, researchers now 
consider the digital world as a new reflexive way to study our societies and collective 
memories. 


The promises of digital resources 


In the same time, archiving and knowledge institutions (libraries, museums, etc.) 
invested in state-of-the-art infrastructures to store, curate and browse large data- 
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bases of digital resources (born-digital data or late-digitized data’). By doing so, 
they allowed for diving into the richness of these catalogs and mine digital traces. 
Indeed, digital resources are charged with an evidential power (Ginzburg 2013) 
that bear witness to some socio-historical processes and can thus be considered 
as traces of the social sphere. But investigating such data is not a trivial task and 
DH scholars have to tackle various socio-mathematical issues when they face digi- 
tal traces: volume, heterogeneity, discontinuity, incompleteness, etc. Yet, these ef- 
forts are worthwhile as the power of connectivity of digital resources (hypertext 
connections, dynamical properties, multi-dimensional relationships, etc.) enables 
more complex and valuable studies (Boullier 2015). 


The point of view of digital tool-makers 


The mutation of part of Social Sciences into Digital Humanities is intrinsically con- 
nected to scholars comprehension of the nature and potential of digital resources. 
Consequently, digital tool-makers? have become central within the scientific ecosys- 
tem of DH and their voices should be taken into account and integrated into our 
own discussions. So far, the notion of scale in DH has mostly been empirically in- 
vestigated and materialized through the uses of software designed by digital tool- 
makers: scale is an in situ feeling whose potential is still largely unexplored. Re- 
cently, tool-makers have been interested in the mathematical, technical and visual 
explainability of their own tools (Jacomy 2021). They aim at preventing complexity 
questions from being hidden behind the metaphor of scale and the illusion of conti- 
nuity induced by software between original data sets and experimental outcomes. 
This idea of not hiding the complexity will now guide our reasoning and help us to 
re-define scale and level. 


2 A false feeling of continuity 


In what follows, we will focus on data exploration tools used by scholars in Digital 
Humanities as they all share a common undertone: they are powered by scale 
mechanisms. Tool-makers usually denominate exploration tools as datascapes; 
that is, ad hoc exploratory environments used by scientists to study digital mate- 


1 With some exceptions, we will mainly use digital resources to refer to both born-digital and 
late-digitized data. 
2 Engineers, computer scientists or social scientists acculturated to digital technologies. 
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rials (Girard 2017). These tools are always built in relation to a specific data set, 
an issue or a research topic- in the same way as R. Rogers issue-driven methodol- 
ogy for building digital tools (Rogers 2013). Datascapes can thus be seen as a mo- 
nadic way to investigate digital traces.? (Tarde 2011) There, what matters most is 
the upstream harvesting of an input digital material and the initial hypothesis 
formulated by the researchers that will later drive the making of the datascape. 
Sometime datascapes become standalone software such as the graph analysis 
software Gephi (Bastian 2009): a datascape that emancipated itself from ad hoc 
constraints and became generic. Overall, every datascape conveys a shared feel- 
ing, the impression that one can zoom within the data, navigate the digital resour- 
ces, change the "scale of analysis". 

Scaling mechanisms are inherent to datascapes. For instance, some research- 
ers use zoom features to explore a citation network with Gephi, other scientists 
apply a change of scale by annotating individual documents to reveal — from a 
bird's eye view - interactions that structure a body of texts, some scholars take 
advantage of geographical data to analyze the dynamics of an historical process 
in various places, etc. In the context of digital humanities, tool-makers rely on the 
nature of digital resources to unblock the navigation from individual elements to 
collective structures. Zooming in or out thus appears to be a natural metaphor to 
explain how the majority of exploration tools work (Boullier 2016). However, 
change of scales, as a DH notion, has never been investigated. This is a major 
issue because zoom mechanisms induce a fake feeling of continuity between the 
original digital resources and their future explorations. Tool-makers hide a lot of 
un-natural tasks behind multi-scale features (Boullier 2016): filtering, aggrega- 
tions, re-processing, etc. The intrinsic complexity contained within the original 
digital resources is thus reduced before being explored through any interface. We 
think that this is the source of the paradox described by M. Jacomy (tool-maker 
and co-inventor of Gephi) in his thesis: 


A Gephi user once told me: *Gephi understands the network, but I do not understand Gephi." 
I understand this statement as an acknowledgement that the visualization is correct despite 
being incomprehensible. (Jacomy 2021: 190) 


By hiding the complexity of the digital resources behind a false feeling of continu- 
ity, implicit zoom mechanisms induce visual explainability issues. The notion of 
scale needs to be defined and separated from the notion of complexity to improve 


3 In Metaphysics monad means unity, in Mathematics monads are used to define sets of rules 
between categories sharing a common space (ie, adjunctions), in functional programming mo- 
nads are used to abstract control flows and side-effects. 
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exploration and analysis processes in DH. Our goal in this paper is to follow a 
complex systems approach to define the concept of scale along with the sibling 
concept of level. We think that this clarification can contribute to the conceptual 
corpus of Digital Humanities and guide the making of innovative tools as sug- 
gested by M. Jacomy to study social and historical phenomenon: 


We cannot see into the complex as if it was simple. We must switch metaphors and build 
our scientific apparatus from a different perspective. We must build something else, for in- 
stance, complexoscapes — composite visualization systems where inevitable reductions are 
counterbalanced by the possibility of navigating between complementary views and visual- 
izations. (Jacomy 2021: 190) 


3 Scale and level as undertones in Digital 
Humanities 


To complete the point of view of digital tool-makers (see section 2) and support 
our thought, we will now conduct a wide review of a corpus of Digital Humanities 
papers that somehow use the words scale and level in their arguments. By analyz- 
ing the many ways scale and sibling expressions are used in the DH literature, we 
will understand how the DH community organizes itself (or not) around this no- 
tion. We will use bird's-eye visualizations of the whole corpus and review individ- 
ual contributions to improve our analysis. 

We first harvest a corpus of 825 scientific papers' metadata! (titles and ab- 
stracts) extracted from both the Web of Science and Scopus and matching the query 
(“digital history" or “digital humanit*") and (scale or level or multi-level* or multi- 
scale* or macroscope or "scalable reading" or “deep mapping"). With the free text 
mining software GarganText? we then extract a list of terms and expressions used 
within the papers by the researchers. GarganText next reconstructs connections be- 
tween these terms by computing the conditional probability of having one term 
written in a paper jointly with another from the list. The resulting map (Figure 1) 
shows the semantic landscape of our corpus. There, terms are dots and semantic 
relationships are edges. Colors highlight communities of terms more frequently 
used together, these groups represent the main subjects of research and communi- 
ties of interest hidden within our 825 papers. We count 5 distinct communities: digi- 


4 The corpus, the list of terms and the resulting map can be downloaded at https://doi.org/10. 
7910/DVN/8C1HKQ. Accessed July 10, 2023. 
5 See https://cnrs.gargantext.org/. Accessed July 10, 2023. 
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tal history and the detection of patterns in literature (purple), the issue of digitiza- 
tion of cultural heritage (pink), modeling as goal (brown, Figure 2), digital library 
and the quality of metadata (orange) and the issue of visualization (blue, Figure 3) 
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Figure 1: Semantic map of 825 scientific papers extracted from both the Web of Science and Scopus. 
Built with Gargantext and spatialized with Gephi. 


The semantic map (Figure 1) shows that scale is not a unified notion in the DH 
literature as our corpus of papers organized itself in very distinct and distant 
communities (a unified literature would have produced a single, large and central 
community). Yet, scale lives through the entire DH literature, not as a well- 
defined concept, but more as blurry undertone. There, scale conveys various 
meanings among different communities: 

promises for future works (“how to scale up the solutions based on collabora- 
tive research efforts" (Tolonen 2019)) 

range of actions over a digital data set (^micro-scale uploads" (Mcintyre 2016), 


"fine-grained annotation" (Wang 2021)) 
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Figure 2: Detail of the community modeling, reconstruction and virtual reality from the map of 
Figure 1. 


- impact, quality or context of an historical event / object (“the rise of revolu- 
tionary movements made manifest through large-scale street actions” (Sakr 
2013), “a center of high level of artistic production” (Boudon 2016), “the recon- 
struction of macro- and micro-contexts” (Nevalainen 2015)) 

— quantity of data analyzed in a paper (“large scale analysis" (Risi 2022), “visu- 
alizing information on performance opens new horizons of significance for 
theatre research at scales” (Bollen 2016)) 

— scalability issues for data engineering (“regarding the overall archiving stor- 
age capacity and scalability” (Subotic 2013)) 

— scope of the observed patterns or temporal motifs (“macro-level patterns of 
text and discourse organization” (Joulain 2016)) 

— choice of an analytical layer in the case of multi-dimensional or cross-domain 
analysis (“every object that is catalogued is assigned entry-level data, along 
with further data layers" (Edmond 2017)) 

— exploration and visualization tasks (“interfaces are a valuable starting point 
to large-scale explorations" (Hinrichs 2015), “screen-based visualization has 
made significant progress, allowing for complex textual situations to be cap- 
tured at the micro-and the macro-level" (Janicke 2017)) see Figure 3; 
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Figure 3: Detail of the community data visualization from the map of Figure 1. 


— choice of a fragmented / elementary reconstruction approach (“our method 
relies on character-level statistical machine translation” (Scherrer 2015), “a 
relatively novel technology which allows to effectively represent the intrinsic 
word-level uncertainty” (Prieto 2021)) 

— an ad hoc question of complexity (“levels of granularity" (Allen 2011), "levels 
of complexity" (Rose 2016)) see Figure 2. 


Thanks to the Figure 1 and details, we reveal that the epistemological frame of 
scale in DH literature is not yet stabilized. For our part, we think that scale should 
be separated from all notions of complexity to clear its definition and help tool- 
makers to create the next generation of datascapes (see section 2). After having 
reviewed the corpus used for building the map, we are able to clear two distinct 
proto-definitions of scale and level as notions: 
1 connected to visualization and exploration tasks, scale can be seen as a zoom 
mechanism. Here, scale becomes a design principle, a feature of interactivity. 
By allowing the user to navigate through multiple scales, the ergonomic 
choices behind the datascapes become an empirical and analytical algebra for 
social scientists; 
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2. the word level is used as a sibling for scale but not as synonym. In fact, level 
is mostly associated to the notion of complexity of the research subject; that 
is, a reconstruction choice made upstream from the visualization task. 


In what follows, we will use these two proto-definitions enriched with Complex 
Systems concepts to introduce our own definitions of what we call: level of obser- 
vation and scale of description. 


4 Level of observation and scale of description 


In order to distinguish between the notions of level and scale, we now use ele- 
ments of knowledge from the Complex Systems' literature (CS). As a research do- 
main, CS naturally addresses the questions of scale, level and complexity as it 
aims at studying how collective structures and global dynamics emerge from indi- 
vidual interactions and how natural or artificial phenomena evolve, connect or 
enrich themselves throughout time. Furthermore, the two proto-definitions out- 
lined in section 3 echo the most recent outcomes of CS’ literature. 


4.1 Level 


CS scholars have recently made a clear distinction between level and scale (Chava- 
larias 2021). Level is generally defined as a domain higher than scale and scale refers 
to the structural organization within a given level. Choosing a level of observation 
means making a choice of complexity. In Biology, for example, the choice of a given 
level determines what the main entities under study (organs, cells, genes, etc.) are. 
Applied to Digital Humanities, choosing a level means choosing the intrinsic complex- 
ity of the processes we want to analyze. In the case of quantitative Epistemology for 
instance, choosing a micro level of observation means choosing to reconstruct the 
evolution of a given scientific domain (from a corpus of scientific publications) by 
looking at the way this domain has resulted from the temporal and internal combi- 
nations of many sub-research fields. There, these small fields of research can be con- 
sidered as elementary entities of analysis. But choosing a macro level of observation 
instead means choosing to consider larger research domains (for instance, Sociology 
or Biology) as elementary entities of analysis. By doing so, we will focus on the evolu- 
tion of the inter-disciplinary interactions between wide research domains. 
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4.2 Scale 


For its part, the choice of a scale determines the resolution adopted to describe a 
phenomenon at a given level. The scale can be seen as an exploration principle 
used for zooming through a given visualization. By re-scaling this visualization, 
we explore different scales of description and we navigate among layers from in- 
dividual interactions to macro structures (Lobbe 2021). The choice of a given level 
occurs once and for all during what we call the reconstruction step upstream 
from the visualization task. The goal of the reconstruction task is to model a phe- 
nomenon from a collection of harvested, curated and annotated digital traces. We 
then use these traces to create a mathematical approximation for the phenome- 
non’s structure and behavior (a network for instance). This reconstructed object 
is then projected in a visualization space where it can be explored and described 
throughout different scales. 


4.3 Socio-technical algebra 


Levels and scales can now be combined as distinct operators of a socio-technical 
algebra. Such an algebra aims at translating any social science research questions 
into mathematical formulas comprehensible by computer scientists. These formu- 
las will eventually be implemented within software or graphically translated 
through interactive user-interfaces. So, let's call y a generic data analysis process 
(represented by a triangle in Figure 4). v consists of three standard steps: the 
data collection (digital resources in Figure 4), the reconstruction (i.e., the com- 
puter-based modelling of the targeted socio-historical phenomenon) and the visu- 
alization (i.e., the exploration of the reconstructed phenomenon through an 
interface by a researcher). The choice of a level occurs during the reconstruction 
step. The choice of a scale occurs during the visualization step. According to our 
socio-technical algebra, y can be seen as a function of both levels x and scales y 
such as oz f(x, y) (see Figure 4). Thus, DH scientists first start by harvesting digi- 
tal traces related to their research question. Then, they choose a level of observa- 
tion that will determine the complexity of the modelling of the phenomenon 
under study. Finally, they visualize the complex object (a map for instance) recon- 
structed by the computer through an interface. By interacting with this interface, 
they will move from one scale of description to another and base their upcoming 
analysis upon this choice of scale. In the future, this socio-technical algebra will 
be enriched with additional operators. For instance, in Section 6 we will intro- 
duce the order 1 and order 2 metrics: two extra operators that can be combined 
with level and scale. 
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5 Using level and scale with phylomemy 
reconstruction 


In this section, we will give a concrete example of how level and scale can be used 
by DH scholars to reconstruct and explore socio-historical processes. We address 
the following epistemological research question: how can we reconstruct the evolu- 
tion of the scientific landscape of a given research domain throughout time? This 
question assumes that the evolution of science can be reconstructed from the main 
traces left by researchers and scholars; that is, publications and scientific papers. 
As a constraint, we won’t use any pre-existing structured resources (citations 
graphs, ontologies, etc.) apart from an initial corpus of scientific publications: the 
dynamic structure will emerge from the co-occurrence and co-use of scientific 
terms and expressions in time. In fact, this research question can be extended to all 
type of timestamped corpus of textual documents: how can we reconstruct the evo- 
lution of a corpus of letters? How can we reconstruct the evolution of a literary 
genre? How can we reconstruct the evolution of online debates? How can we visu- 
alize the coverage of a targeted public event by newspaper through time? etc. 

To answer this question, we will make use of a new scientific object: the phy- 
lomemy (Chavalarias 2021; Lobbe 2021). Phylomemies can be reconstructed on top 
of any timestamped corpora of text data. The phylomemy reconstruction process 
is part of the larger family of co-word analysis approaches: a type of text mining 
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techniques used to analyze the co-occurrences of words, terms or expressions 
within texts (Callon 1986). More precisely, Phylomemies are designed to recon- 
struct the dynamics of terms-to-terms relationships through time and visualize 
the evolution of semantic landscapes. Phylomemies are thus inheritance networks 
of textual elements of knowledge. Applied to the analysis of the evolution of sci- 
ence for instance, phylomemies can be seen as genealogical trees of scientific 
fields that structure themselves in evolving branches of knowledge; that is sub- 
domains of research contained within a given discipline. 


5.1 Methodology 


Phylomemies have first to be reconstructed from a collected set of documents be- 
fore being visualized and explored through dedicated software (see Section 6). 
The reconstruction process can be divided into four subsequent steps: 1) Index- 
ation, we first frame a corpus of texts and then extract its core vocabulary (terms 
or expressions). We next choose a temporal resolution (e.g. 3 years) that chunks 
this corpus among ordered sets of equal periods. Within each period, we compute 
the terms' co-occurrence their co-presence in the original documents. 2) Similarity 
measures, within each period we use the co-occurrence of terms to compute a 
similarity measure. It results in graphs of similarities potentially containing 
meaningful groups of terms frequently used together. We call these groups fields. 
3) Field clustering, a clustering algorithm is then used to detect coherent fields of 
terms within each period. 4) Inter-temporal matching, an inter-temporal matching 
mechanism reconstructs the kinship relations between fields from one period to 
another. It assigns each group of terms a set of parents and children by using a 
semantic similarity measure. By doing so, we highlight elements of semantic con- 
tinuity over time called branches. 


5.2 Level and scale with phylomemies 


The phylomemy reconstruction process already considers the notion of level and 
scale as defined in section 4: the level can be set up during the reconstruction step; 
the scale can be set up during the visualization step. There, the level of observation 
has been modelled as a continuous variable A € [0,1] and a quality function F, has 


6 See Chavalarias 2021 for details concerning the algorithms and text mining techniques used in 
the phylomemy reconstruction process. 
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been designed to control the intrinsic complexity of the phylomemy (Chavalarias 
2021). F, aims at answering the following question: “What should be the global 
shape of the phylomemy, so that for any term x I could be able to find an informa- 
tive branch of knowledge dealing with x?”. By choosing a level A between [0,1], we 
influence the informativeness of the branches of the resulting phylomemy: they 
thus might vary from very precise branches to very generic branches. For a low 
level (A — 0) all fields and terms will be connected within few but large branches; 
For a high level (A — 1) the phylomemy will look like an archipelago of specific and 
accurate branches. Then, once the phylomemy has been reconstructed, it is pro- 
jected in a visualization place - a dedicated datascape — where researchers can ex- 
plore the evolving structure from term relationships to branch similarities and 
choose the good scale of description and resolution to analyze the whole ob- 
ject (Lobbe 2021). 

As we have already harvested a corpus of scientific publications in section 3, 
we will now reuse it (along with its list of terms) and reconstruct its semantic evo- 
lution. This will give us clues of how the domain of Digital Humanities has posi- 
tioned itself regarding the notion of scale in the last 15 years. This will enrich our 
analysis of Figure 1 with a temporal perspective. We first reconstruct the phylo- 
memy for a level À) = 0. 
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Figure 5: A phylomemy of the scientific literature of digital humanities reconstructed for A = 0, 


In the resulting Figure 5, the phylomemy must first be read from top to down. 
Circles represent groups of terms jointly used together in a set of papers at the 
same time. The bigger the circle, the larger the number of matching documents. 
For readability motive, we here choose not to display the textual content of the 
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circles. We highlight in yellow the origins of significant sub-research domains of 
DH motivated by classical pre-digital research subjects (archiving, education, tex- 
tual criticism, etc.) or new type of digital resources (web data, digital images, digi- 
tized scholarly texts, etc.). 

But this phylomemy (Figure 5) cannot be considered as informative: its com- 
plexity is too high, we need to simplify it (ie, remove weak kinship links) to reveal 
more structured shapes. To that end, we now reconstruct the same phylomemy 
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Figure 6: The progressive specialization of phylomemies of the scientific literature of digital 
humanities reconstructed for À = 0.2, 0.5 and 0.8. 
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The resulting Figure 6 shows how the phylomemy specialized itself for subse- 
quent levels À> 0. To build this visualization, we have first tested many values of 
À and we have then taken on the values for which the complexity of the phylo- 
memy significantly changed that is: 2 = 0.2, 0.5 and 0.8. Thus, the scientific land- 
scape of DH - regarding our original corpus of papers - starts to structure itself 
at A- 0.2: the single and large branch (Figure 5) breaks into smaller branches that 
represent satellite sub-research domains of DH like historical linguistic, literary 
style analysis, crowdsourced transcription and annotation, etc. We then need to 
reach A=0.5 to see the emergence of the main Digital Humanities’ research 
branches: digital history, archeology, network analysis. Finally, very specific 
branches like text visualization appear and stabilize themselves at A = 0.8. 

By choosing the example of phylomemies, we see how our definition of level 
can be used to analyze the same process - the temporal evolution of DH litera- 
ture — at different degrees of complexity and how we influence the intrinsic struc- 
ture of the resulting visualization. We finally invite the readers to explore the 
Figure 6 in our online datascape’ to experiment some multi-scale navigation 
mechanisms. Indeed, phylomemies embed an endogenous scaling mechanism 
that builds on the kinship links of each branches. The weights of these links (ie, 
the semantic similarity measure) are sorted and distributed among increasing 
ranges that result in a finite number of scales per branch. By moving from one 
scale to another within our datascape, scholars can choose a suitable resolution 
for each branch and aggregated groups of terms whose link weight is inferior to 
the selected scale. In the Figure 7, we show how this mechanism can be used on 
the unnamed grey branch of the phylomemy A = 0.2 of Figure 6. This branch em- 
beds about twenty different scales of description. 


6 Beyond level and scale, introducing GarganText 


Recent technical reviews (Chavalarias 2021; Lobbe 2021) have shown that no soft- 
ware or datascapes are today able to implement the notions of level and scale as 
defined in Section 4. That's the reason why we have decided to create our own com- 
plexoscope (see Section 2). We here want to introduce GarganText? (Delanoe 2023) a 
free text mining software, heir of De Rosnay's macroscopes (Derosnay 2014); that is, 


7 The phylomemy reconstructed for level 0.2 can be explored at http://maps.gargantext.org/ 
phylo/zoomland/. Accessed July 10, 2023. 

8 GarganText has been invented and is developed by A. Delanoé at the ISCPIF CNRS, see https:// 
gargantext.org/. Accessed July 10, 2023. 
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a software designed to reveal the global structure and dynamics of corpora of tex- 
tual documents by computing individual interactions (co-occurrence,? conditional 
probability,” temporal similarity,” etc.) between terms and expressions. Contrary 
to classical datascapes, GarganText is data agnostic, it takes any textual elements as 
input data (short texts, novels, corpus of thousands of papers, etc.) and produces 
three types of visualizations: basic charts, semantic maps and phylomemies. Be- 
tween the initial corpus and the resulting visualizations, GarganText uses the func- 
tional capacity and mathematical stability of the programming language Haskell’ 
to materialize an agile data analysis process. By using a functional programming 
language, GarganText allows scholars to control and customize their analysis at 
will by choosing between different strategies (implemented as standalone functions 
in the source code) and even by going backward to previous steps. The researchers 
can thus choose a first level of complexity, jump to a phylomemy, explore its scales 
and then go back to another level, and so on. By doing so, GarganText creates a 
true continuum of data exploration, from backend functionalities to design princi- 
ples. Finally, GarganText enables collaborative and decentralized analysis. 

Discussion. Can we go beyond level and scale? Can we imagine additional 
variables to our data analysis process f(y)? We think that the answer is yes and 
GarganText already implements a linguistic order when one wants to reconstruct 
a semantic map. Using the graph functionality of GarganText after having se- 
lected together the right terms to throw light on, we can analyze two types of 
graphs, the one of order 1 and the one of order 2. Each type of graph has its own 
interpretation regarding our research purpose. 

The order 1 graph is used to approximate the global quality of the corpus. 
Analyzing its clusters gives a simple idea of the main picture of the corpus by de- 
tecting eventual noise in it. The order 1 graph results in semantic clusters from 
association of terms in conjunction (i.e., terms A and terms B are in the same tex- 
tual context). The central clusters show the main topics of the corpus and its pe- 
ripheral clusters describe the secondary themes. 


9 For instance, in the corpus of, the terms micro-digitisation and sustainability are jointly used 
three times by various authors so their co-occurrence count is 3. 

10 For instance in the Figure 1, the weight of the link (ie, the conditional probability) between 
knowledge graph and rdf is 0.8. It means that we read knowledge graph in a paper, it will be very 
likely associated to rdf. 

11 For instance in the Figure 5, the temporal similarity between the group of terms language, 
debates, corpus linguistics, historical linguistics & discourse structure (in 2014) and term annota- 
tions, language, software, corpus linguistics & historical linguistics (in 2017) is 0.65 as they share 3 
out of 5 terms, knowing that these terms are weighted regarding their specificity in the corpus. 
12 See https://www.haskell.org/. Accessed July 10, 2023. 
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The order 2 graph shows the clusters built from the graph of association of 
terms in disjunction (i.e., terms A and terms B that can be interchangeably used 
in same textual context). As a consequence, the clusters throw light on the main 
concepts of the current corpus. For instance, the Figure 1 is an order 2 graph. 

Hence the order 1 and order 2 present two different types of interpretation. 
First order graph shows the subjects to improve both the quality of the set of 
documents and the selection of the terms under study. Second order graph shows 
the main concepts highlighted in the corpus for the research goal of the team 
working together in the GarganText collaborative working space. 

To conclude, we want to call to mind that level and scale — as defined in Sec- 
tion 4— are two orthogonal notions of a much more complex socio-mathematical 
algebra meant to mine digital traces. In the future, we invite DH scholars to en- 
rich this algebra by the defining new notions or objects (such as order 1 and order 
2 graphs) along with interoperability operators: for instance, the level is a higher 
order entity than the scale. GarganText already lays the groundwork for more in- 
novative analytical variables that will take advantage of the power of reflectivity 
and connectivity of digital resources. 
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Francis Harvey and Marta Ku¿ma 

® Zooming is (not just) Scaling: 
Considerations of Scale in Old Maps 
from Cartographic Perspectives on 
Generalisation 


Abstract: This chapter examines how cartographic concepts of scale are devel- 
oped to help provide a framework for traditional map production including gen- 
eralization. Contemporary digital humanities researchers rarely have knowledge 
of this framework and practices. Geographers work closely with cartographers, 
leading to map scale possessing a wide array of epistemological and ontological 
characteristics that is inherent and even implicit. Map scale is essential to other 
fields. The digitization of traditional maps and the creation of scanned map appli- 
cations for computers in the 1990s open new types of access. Combined with 
scale-dependent raster graphics, the convincing illusion of zooming through 
space developed as the primary interface for digital maps. The informationaliza- 
tion of geographic knowledge and cartographic functions has extended zooming 
to these maps, although these digital maps are distinct from traditional paper 
media maps. Theories and methods of historical interpretative research extend 
new possibilities for DH research to consider cartographic generalization and its 
manipulations indicated by scale. 


Keywords: generalisation, cartographic scale, historical maps, old maps, map 
interpretation 


1 Introduction 


Cartographic scale can become a surprisingly challenging concept considering con- 
temporary possibilities for zooming (and panning) digitized old maps. Working 
with digitized copies of old maps, which researchers can easily find, ignores the 
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the editors of the Zoomland book and the helpful comments provided by participants of the Zoom- 
land workshop in June 2022. 
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condition of old maps, but this approach runs the risk of misinterpretation. Consid- 
ering scale in historical interpretative work involves many issues. Addressing these 
issues, this contribution draws on cartographic research, provides some back- 
ground into cartographers’ concept of generalization and the process for making 
graphic alterations for different scales and purposes, and suggests some analytical 
techniques to gain helpful insights into potentially significant alterations of old 
maps. These insights are primarily directed at digital historical researchers yet have 
relevance for curators and others in the public humanities working with old maps 
and seeking helpful insights to make old maps more accessible to any audience. For 
readers, we stress that our focus is on “Generalization”, the term for cartographic 
operations that change cartographic features, usually from large-scale maps of 
smaller areas to smaller-scale maps of larger areas. These operations can make fun- 
damental and even possibly arbitrary changes to the representation of map fea- 
tures. A generalization can correspond to scale directly, but its relationship with 
scale involves more complex representational and epistemological aspects. Changes 
to a map through the processes of generalization are often also a matter of a map's 
intended function or use. We point out in the contribution that cartographic scale 
refers, on the one hand, to a metric relationship; it also refers to complex to ascer- 
tain semiotic graphic choices arising from the selective implementation of gen- 
eralization operations that alter shapes, change locations and even refigure the 
relationships among map elements. Cartographic scale, carefully considered in 
both the creation and consideration of maps, is a very demanding topic. Of course, 
generalization is just one part of the cartographic process and production. Projec- 
tions can famously produce very significant changes in the shapes of cartographic 
features. Our focus on better understanding the impacts of generalization in histor- 
ical maps reflects the significance of its consequences (see Figure 1). 


Figure 1: Three old maps (from A. Palairet (1765), B. Perthées (1794), C. “Plan of the Warsaw 
suburbs” (1836)) showing different scales and generalisations of the Vistula (Wista) River. 
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In our contribution to this book, we also place the changes arising from generaliza- 
tion operations in the context of contemporary zooming capabilities, which most 
readers today are more likely familiar with. Zooming a digital map (mostly a 
scanned version) involves a very different epistemology for a map reader who can 
now interactively zoom around a digital(-ized) map. The generalization may occur 
during zooming, but is not intrinsic to an interactive map reading process focused 
on the user's ability to zoom arbitrarily and pan. In contrast, old maps were pre- 
pared and generalized for a so-called *bird's eye view", but as D. Haraway suggests, 
the view from nowhere requires learning to read and interpret the map in its pre- 
formed objectivity and perspective (Haraway 1988: 575). 

A central point in this contribution is that traditional cartographic generaliza- 
tion might seem even whimsical to the public humanities and digital historians. 
Still, cartographers of past epochs were mostly bound by partially defined but 
often idiosyncratic design processes and rules. In more traditional handwork- 
orientated work organizations producing maps with various print processes, the 
more expert cartographers provided guidelines and checked the map in produc- 
tion; these were usually strictly enforced in the hierarchical organization of the 
workplace. Readers (or users) getting these cartographic products and using 
maps before digital versions became available did not have to learn how to read 
the alterations made in the generalization process. Still, changes to other map 
products meant readers relying on maps had to become experts (implicit or ex- 
plicit) in particular maps or map series. In other words, generalized maps were 
logocentric, bound up with cultural and institutional values in straightforward 
ways (Abend and Harvey 2017). We still experience this when we use unfamiliar 
maps of specific fields (such as city fire department maps, electrical utility maps 
or cadastral maps). We might not understand the guidelines, but we may be able 
to assess the impacts on our understanding of old maps. In contrast, our current 
use of most digital maps (especially GoogleMaps and similar applications on mo- 
bile devices) involves an egocentric experience of map media — we quickly learn 
to apply this interface that allows us, as in so many other graphical applications, 
to zoom and pan. The activity of zooming is always related to a sensory experi- 
ence and rests on digital habits of visual engagement. In the most significant epis- 
temological way, working with logocentric media, such as old maps, requires us 
to learn some of the logocentric approaches from another way of presenting and 
positioning map elements we now frequently rely on. 

The issues and approach we present to consider scale in interpretative histori- 
cal work involve many issues. This contribution aligns with spatial humanities re- 
search, specifically taking up cartographic research and to the digital public 
humanities. We understand an important and relevant goal of the digital public hu- 
manities is the engagement of participants in a historically contextualized conver- 
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sation like in the Photogrammar project (Cox and Tilton 2019). Importantly, for our 
contribution to this book, most of the place-related knowledge from the past for 
working with old maps is now lost. In the approach we introduce here, we turn to 
some metric measures to assess and thus better understand past generalizations' 
effects on old maps. 


1.1 Generalization exemplified 


Considering the effects of generalization operations as the implementation of mainly 
scale-related changes in historical interpretative work involves many issues. These 
issues can be significant for historical researchers using old maps, particularly topo- 
graphic maps from the 17th through 20th centuries (See Figures 2 A-C and 3 A-C). 
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Figure 2: A. Part of the topographic map in 1:10,000 (Rzeszów sheet), B. The same area on a 1:50,000 
map (Rzeszów-Zach). C. Part of a 1:50,000 map enlarged to 1:10,000 (Ostrowski 2003). The didactic 
pair of maps illustrates the numerous effects of scale-related changes in all traditionally prepared 
maps. 


Scale-related changes in generalized maps can be significant. Generalization oper- 
ations can alter locations, relationships and even the graphical presentation of 
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Figure 3: A. Part of topographic map in 1:10,000 (Warsaw, sheet 263.343, 263.344, 1982), B. Part of a 
1:25,000 map (Warsaw, sheet 263.34, 1981) enlarged to 1:10,000, C. Part of a 1:50,000 map (Warsaw, 
sheet 263.3, 1973) enlarged to 1:10,000. The reduction of graphic information and changed emphasis 
on road connectivity, large buildings and monuments for orientation purposes is common in the 
generalization of smaller-scale topographic maps. 


places (see Figures 1-3). During generalization, cartographic elements move, 
change shape and combine with other elements. In Figure 2, for example, note 
how the bushes and paths of the square disappear or how street areas are ab- 
stracted from the 1:10,000 to 1:25,000 and 1:50,000 scale maps. Depending on the 
scale and often the projection, generalization is a step in the traditional carto- 
graphic process to enhance the map's legibility for its intended use. As we show, 
the resulting challenges of distortions, errors and omissions involve specific 
changes that, once recognized, can always be considered in an interpretative re- 
search process and frequently quantitatively analyzed to assess their epistemolog- 
ical significance and consequences for the research. 

As we suggest here, the generalization of old maps is almost always challeng- 
ing to understand in the digital humanities. This almost contradiction to familiar 
ways of thinking of maps as veridical representations (of some) of the world 
makes it an exciting way to access old maps in the public humanities and con- 
sider what has changed over time in distinction to what has changed in the car- 
tographers' representations. Further, critical cartography literature (Crampton 
and Krygier 2005; Harvey et al. 2005; Michel 2017; Pickles 2012) offers insights into 
the roles of projections in global and world regional maps, the rich semiologies of 
map symbols and the potential of map layouts to focus readers' attention and 
help remove areas from visual perception. However, the significant alterations of 
generalization receive only scant attention in these critical engagements. Fortu- 
nately, in seeking to automate the processes of generalization, some cartogra- 
phers have produced some exciting studies and approaches that we draw on to 
help generate some statistics and comparisons to at least improve our under- 
standing of generalization's effects on a historical map. 
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After reviewing generalization concepts and presenting examples, the paper 
points to available resources for analzsing generalization’s effects. We see this as 
a further contribution to the spatial humanities (Bodenhamer et al. 2010) and the 
critical assessment of the geographic information system (GIS). Before concluding, 
we draw on the macroscope concept (Börner and Polley 2014) to describe a frame- 
work for using these tools in a prototype to support interpretative historical anal- 
ysis of old European topographic maps, which have now been made digitally 
available as graphic rasters or vector graphics. The conclusion summarizes key 
issues and discusses future research topics and methodological developments. 


2 Cartographic generalization: Concepts 
and relevance 


The issues related to scale in historical interpretative work can be well under- 

stood by considering cartographic concepts of generalization. The first principles 

of cartographic generalization taught to cartographers reflect the commonly used 
definition of cartography as the art and science of map-making (Lapaine et al. 

2021). In map production of paper maps, the generalization would frequently 

have guidelines for that particular type of map, its intended uses and distinct sty- 

listic elements. However, the actual process was done by individuals, following 
guidelines or rules, but usually with some flexibility and tolerance. It was not un- 
usual for specific individuals with more experience or skills to assist in the pro- 
cess and to organize generalization to distribute the work based on skills and 

competencies (Meyer 2021). 

Ratajski’s (Ratajski and Lipinski 1973) well-known work in cartography di- 
vided generalization into content and quality. Content generalization connects to 
selection-based semantic operations while quality generalization is related to 
symbolization and graphic map design’s classification and enhancement dimen- 
sions. Generalization operators with rules based on Polish cartographic literature, 
which has strong parallels to cartography texts in other languages and countries, 
suggest several criteria to consider for each generalization operation (Ratajski 
and Lipinski 1973): 

1. Size (Figure 4) - the number of people for cities and the frequency of running 
trains for a railway. The length of line objects or the area of polygons decides 
which line or area is on the map (Paslawski 2010), e.g. lakes under one mm? 
area at map scale, rivers under 1 cm wide in the map scale should be omitted 
(Grygorenko 1970). 
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2. Function - a city can have administrative (capital city), communication (rail- 
way junction), trade, industry (industrial city), tourist or educational functions; 
roads can have technical functions, e.g. state roads, voivodeship, provenance 
roads, roads which are the best connection between cities represented on the 
map, railway — central railway (international, express) should be presented on 
the map, narrow-gauge railways can be omitted (Paslawski 2010). 


a. 


f. 


Historic-traditional — the places which were interesting or important in 
the past, e.g. Wislica, Pompei. 

Centrality — when one particular place significantly influences others, e.g. 
capital city, car manufacture. Small cities in the area without people should 
appear on the map, but not always near a large city. In urban areas, we can 
omit big cities because around them are bigger (Pastawski 2010). 
Up-to-dateness — when a place is important according to current events, 
situations, it should be presented on the map, e.g. headquarters of an in- 
ternational organization (Geneva — United Nations (UN)), places of art 
festivals. 

Tendency to changes - objects which show the development tendency, 
e.g., roads and highways, but railways can be omitted. 

Typically — we should keep shorter rivers, which are close to a state bor- 
ders, those rivers flowing through the big cities. Retain the density of the 
hydrographic network (Paslawski 2010). 

Selection rule — square root law (Tópfer and Pillewizer 1966). 


3. Classification (Ratajski and Lipiñski 1973): 


a. 


C. 


Qualitative — merging objects based on qualities (attributes), e.g. the sys- 
tem of classification — land cover classification, types of tree (deciduous, 
coniferous or mixed forest) are presented on the maps in large scale, for- 
ests, which embrace all types on small-scale maps (Grygorenko 1970). 
Quantitative — grouping the numbers in classes, e.g. number of people in 
the city. 

Reference area. 


4. Simplification (Paslawski 2010): 


a. 


Line — we can leave slight bends of rivers and shorelines. Shorelines 
have the most details and are elaborated at first (as the first object). 


The breadth, depth and overlaps of the criteria make plain that even after hun- 
dreds of years of cartography, no clear rules or guidelines define how to general- 
ize or even provide *best practices." Instead, as cartographers describe the field 
more generally, generalization involves artistic and scientific dimensions. The 
choices in generalization are effectively limitless. What matters is the graphical 
quality of the resulting map, especially its legibility. Scale is central to under- 
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Figure 4: Lakes on two example maps (A. Sikorski and Litografia Artystyczna (1904), at 1:1,000,000 
scale; B. Topograficzna Karta Krölestwa Polskiego (Richter 1839)). 


standing generalization, but the actual transformation of graphic representations 
will be pragmatic regarding production constraints, (considered) user require- 
ments and existing conventions. Generalization is more than a veil or grid; it can 
involve countless choices and considerations beyond an inventory or single theo- 
retical analysis. We need to take up generalization for source criticism research 
approaches as a complex and inaccessible process used with all old historical 
maps. Focusing on the products instead can open the door to understanding con- 
sequences and comparing old maps. 


2.1 Related work 


The broad range of cartographic production environments and contingencies 
meant a pragmatic approach to map creation defined contemporary cartographic 
textbook presentations of scale and generalization for the practical demands of 
map production. For example, in his guidelines for cartographers in British colo- 
nial territories, Arthur Hinks assesses scale in terms of a balance between show- 
ing as much as possible at a particular scale and making as much as possible 
from them (cartographically) (Hinks 1947). More recent cartographic texts high- 
light the distortions that must accompany creating maps (Jones 1997). Scale is re- 
lated to the process of generalization that must increase graphical abstraction 
while retaining the necessary graphical meaning of the information conveyed 
through symbols and geometrical shapes. Semantics stands out but is never a 
measure or even a concept. 

Not knowingly indicating a future post-structuralist engagement with the roles 
of the reader, Hinks, in writing for creators and users of colonial maps and here 
relevant to the reconstruction of generalization, stresses the logocentric nature of 
map use: “it [facility in map reading] can only be attained by constant practice in 
the field, and it is useless to attempt to lay down many rules for it” (Hinks 1947: 41). 
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The pragmatic focus with considerations of economics and efficiency aligned with 
aesthetics and quality objectives meant cartographic generalization concepts were 
an advanced topic for training future cartographers. Another emphasis was on on- 
the-job learning at the United States Geological Survey (USGS), writes Patrick 
McHaffie in a valuably insightful study of cartographic production (McHaffie 2002). 

Given the overwhelming pragmatic orientation of map production and hence of 
generalization, historical research considering generalization’s impacts must care- 
fully consider the contexts and contingencies of the map producers and the re- 
searchers. The latter’s theoretical frameworks, institutional affiliations and goals are 
crucial in interpreting generalization in old maps as the same factors were for map 
producers of earlier periods. Whereas a textbook on cartography from the era of 
Hinks situates map production in a colonial context, more recent cartography publi- 
cations, drawing on information age concepts, focuses on computational approaches 
to enhance generalization and, if possible, describe it in terms of separate functions, 
which can be broken down into algorithmic components for implementation in digi- 
tal map production processes (Chaudhry et al. 2009). A common thread for present- 
day interpretation of old maps is present in the cartographic, logocentric under- 
standing and presentation of geospatial phenomena, which defined generalization 
for cartographers in terms of goals, data quality and, above all, legibility. 

Jean-Claude Müller and co-authors point out these issues in a framework in 
their 1995 edited collection. The introductory discussion offers insights into pro- 
duction issues and challenges for large national mapping agencies and the role of 
academic research. The so-called cartographic gaze (Müller et al. 1995) is less im- 
mediately evident in generalization. There is no single cause but a modern even 
instrumental focus on objectives in a complex system, further divided into a 
broad range of categories by academics. The implementation can lead to contra- 
dictions. A collection of modelling operators became inventories of functions pro- 
vided by early commercial GIS software companies. This change, a harbinger of 
new software-based capabilities, so-called toolboxes, for specific implementa- 
tions, drew on existing cartographic knowledge from mapping agency guidelines 
and challenges of extracting procedural knowledge, described abstractly earlier. 
However, researchers envisioned logical programming or hypergraphs as poten- 
tial technologies to develop systematic generalization using computers (Müller 
et al. 1995). 

Legibility of the results at more minor scales remains the focus of the gener- 
alization, but data quality is becoming more crucial to it in digital approaches to 
generalization. Metrical, topological and semantic accuracies are aspects, but the 
implementation in GIS was the more important matter for pragmatic carto- 
graphic approaches. 


436 — Francis Harvey and Marta Kuzma 


The research on automated map generalization offers starting points and 
techniques for analyzing old maps, which we will take up in the next section. Be- 
fore moving to that analysis, we first briefly consider some relevant epistemologi- 
cal aspects to help distinguish and understand map generalization with actual 
examples. 

The epistemological aspects of scale and generalization, taken in the context 
of any cartographical representation, are complex, but a brief contextualization 
is due. Contemporary approaches to generalization emphasize instrumental func- 
tions that should achieve goals expressed by metrics and visual assessment (Lee 
1996). The goals are rarely simple, but the instrumental approach lends itself to 
an iterative process that can be refined and scaled to production needs and goals 
using digital tools. While legibility and truth are over-arching aspects, pragmatic 
concerns focus on evaluation and more functional concerns gain the upper hand 
in considerations over epistemological aspects. With the pragmatic emphasis on 
efficient production, legibility and truth (see discussion of Müller et al. 1995) were 
in, the development of automated generalization is most often part of carto- 
graphic critique (Harley 1989). Most generalization efforts seldom took up the 
challenge of finding a compromise of legibility and veridicality. 

To give this consideration some context and digital humanities relevance, we 
analyzed the Vistula River to explain cartographic line and cartographic visuali- 
zation for this research. We compare the length of the Vistula river based on nine 
maps of the area around Warsaw (Footnote: All maps were previously digitized, 
making comparisons straightforward in a GIS) presented in Table 1 and Figure 5. 
We present the results of the comparison in Figure 6. Several developments 
among surveyors and mappers are relevant for interpreting these maps. In 1775 a 
corps of crown engineers measured fortifications and offensive buildings; they 
had to build roads and bridges and create maps. They also worked on establishing 
the borders of Poland between Prussia, Russia and Austria (Bartoszewicz 2020). 
Thanks to this, the length of the Vistula river was surveyed more accurately. 
From 1832 to 1865, Prussians advanced triangulation with support from Russian 
surveyors. They developed a more accurate triangulation including observatories 
in Krolewiec and Dorpat (Tartu) as part of the connection between the west and 
Russian triangulations (Kryñski 1970). The length of the Vistula river ranges from 
968 to 1,070 km. In the most recent nine surveys, we consider (2012), its length is 
1,027 km. 
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Figure 5: Portions of nine maps of approximately the same area, including Warsaw (16th century 
(Atlas Fontium 2022), 1676 (Speede and Dirck 1676), 1765 (Palairet 1765), 1796 (Dunn 1796), 1813 (A 
Map of Poland: Engraved for the Military Chronicle 1813), 1871 (Baraniecki 1871), 1914 (Map of the 
Polish lands from the Oder to the Dnieper, from the Carpathians to the Baltic Sea and the Dvina, 
1914), 1944 (Bartholomew and Sekcja Wojskowego Instytutu Geograficznego 1944), 2012 (EU-Hydro - 
River Network Database - Copernicus Land Monitoring Service, 2012)). We used digitized versions of 
these maps to compare the length of Vistula, part of Vistula close to Warsaw. 
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Figure 6: The Vistula river from the nine various maps (Figure 2.1) with the extracted Vistula river 
from each (corresponding dates of the map are indicated). 


Table 1: The length of the Vistula river from the nine maps compared. 
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3 Heuristics for interpreting old maps 


3.1 No rules, no guidelines, only interpretation with some 
metrics 


Given the epistemological complexity of generalisation and the pragmatic focus of 
cartographers creating generalized maps, we approach heuristics for considering 
the effects of generalization in the practical interpretation of old maps. In the gener- 
alization of old maps, the grounding assumption followed specifications and conven- 
tions. Yet, generalization practices could be idiosyncratic. Therefore, analyzing old 
maps for generalization effects precedes that quantitative analytical approaches can 
establish systematic changes. The actual changes created in a generalization process 
are impossible for most people to reconstruct in a reasonable amount of time. How- 
ever, researchers can use metric measurements to assess the impacts of generaliza- 
tion and gain the necessary insights to interpret generalization's impacts on a 
particular historical map. These changes can have significant implications for histor- 
ical interpretation. 

The three metrics for three different types of generalization, reflecting the 
earlier considerations of generalization's impacts, we suggest using for the histor- 
icalinterpretation of old maps are 
— Counts of features in different maps for comparisons of generalization im- 

pacts (Elimination) 

— Use of shape metrics from computer graphics to assess and compare features 
in multiple maps (Simplification) 
— Calculation of displacement using a verified reference or contemporary data 

(Displacement) 


These metrics offer the potential of being comprehensible to most historical re- 
searchers with limited knowledge of cartographic production processes. How- 
ever, we need to point out that the presentation here is preliminary. These 
metrics emerge from automated map generalization research (van der Poorten 
et al. 2002) and will need to be refined and often reconfigured and implemented 
to assess the impacts of generalization in old maps. This work is a task far beyond 
the scope of this paper but certainly the direction for future research. We also are 
developing usability assessments to guide the development and refinement of the 
implementation. 
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3.2 Counts of features in different maps for comparisons 
of generalization impacts (Elimination) 


Elimination is perhaps the most brute force generalization operation. Based on a 
selection which follows specific criteria, in simple versions, just the size of a fea- 
ture, those features, usually smaller than a particular size, are selected and re- 
moved. An example of the effects of this operation can be seen in Figure 3 above 
and Figure 7 here. The concept is straightforward, but the application calls for 
restraint and careful iterative trials. Still, given its speed and ease of implementa- 
tion, it is more frequently used in digital cartography. Traditional cartographic 
production required a more judicious application together with other generaliza- 
tion operations due to elimination's destructive consequences, which could be 
daunting to correct in the traditional production process. 


Figure 7: Two maps (from 1915, scale 1:25,000 (Warschau 1915) and 1943, scale 1:20,000 (Stadtplan 
Warschau 1943)) showing elimination. The right panel shows the elimination of the paths in Royal 
Castle Garden (1) and the left panel shows the elimination of some slope information (2, 3). 
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3.3 Use of shape metrics from computer graphics to assess 
and compare features in multiple maps (Simplification) 


For vectorized old maps, shape metrics offer a well-established approach from 
computer graphics and automated map generalization (Basaraner and Cetinkaya 
2017; Brus et al. 2014; Fan 2012; Visvalingam 2016) to assess changes to the shapes 
of features. While, to some extent, with some training and sufficient time, many 
changes can be visually located, shape metrics do this more quickly and will de- 
tect many changes that most people readily overlook. Computer graphics have de- 
veloped many shape metrics (Ware 2008; Ware 2021) because of their importance 
in automated visual analysis. While this area has moved to computational neural 
networks to improve accuracy and speed, automated generalization frequently re- 
lies on earlier work to account for the complex geometric changes that come with 
simplification operations of both Euclidean geometry and orientation changes 


Figure 8: Two maps (from 1915, scale 1:25,000 (Warschau 1915) and 1943, scale 1:20,000 (Stadtplan 
Warschau 1943)) showing simplification. The left panel shows the outline of churches simplified to 
symbols (on the map St. John's Cathedral) and the right panel creates generalized built-up areas 
instead of indicating particular buildings (5). 
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among elements of individual features (see Figure 8). While these techniques may 
still be more cumbersome than neural networks, they are possibly easier to under- 
stand. Still, the equations’ complexity may prove a barrier to implementation in 
historical interpretative research. 


3.4 Calculation of displacement using a verified reference or 
contemporary data (Displacement) 


Displacement is often an essential operation in the cartographic generalization 
that retains relationships among cartographic features that correspond to experi- 
ences, while altering their locations or distorting their shape (see Figure 9). The 
assessment of displacement relies on applying functions developed to displace 
cartographic features in map generalization for the assessment of displacements 
from larger-scale to smaller-scale maps. These approaches again make use of met- 
rics that describe line geometry and shape. 


Figure 9: Two maps (from 1915, scale 1:25,000 (Warschau 1915) and 1943, scale 1:20,000 (Stadtplan 
Warschau 1943)) showing displacement. The left panel shows the displacement of a railway (6) and 
the right panel shows this displacement as well as the displacement of a road. 
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Obviously, at the level of features, the assessment of displacement is computation- 
ally involved and requires vectorization and differentiation of different types of 
features, which can be very complex. A more straightforward but relevant analy- 
sis of displacement is possible at the map sheet level. Also, researchers must as- 
certain if ageing, warping or fundamental geodetic differences between the old 
map in question and other maps or recent maps influence displacement. The soft- 
ware package MapAnalyst offers this functionality (see Figure 10). 


Figure 10: Georeferenced map showing displacement of points which was used to the 
georeference. Vectors show the displacement. 


3.5 Heuristics (initial) for interpretation in the digital 
humanities 


In historical research involving old maps, assessing generalization's impacts in- 
volves multiple considerations of potential relevance. Although there are prob- 
lems with a scheme that goes from more to less complex, we hope simplifying 
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issues to consider in heuristics of old map interpretation provides a valuable 
framework for the many uses of old maps in historical research, e.g. in museum 
work. These heuristics are clearly starting points. Much work is needed in specific 
contexts and to systematically consider scale in interpretative historical work. 


4 Summary and outlook 


This contribution describes a conceptual framework for assessing generalization’s 
impacts on old maps for historical interpretative research, curation and wider hu- 
manities discourse. Clearly, given the limited work to assess generalization in old 
maps, this chapter should be seen as a beginning, not an end. Looser stressed that 
public humanities should seek many audiences, not just scholars, academic audi- 
ences or the smartest readers, viewers or listeners (Looser 2019). The assessment 
of map generalization could give a broader audience a chance to understand 
maps in a more confident way. Further, this contribution takes up issues for his- 
torical cartographic research and considerations of both quantitative and qualita- 
tive uncertainty showing features from old maps (Figure 11). 


Choose background map 


Choose overtay map 


Figure 11: A sample mockup for a user interface for exploring old maps. 


Further work on considering scale in historical interpretative work in its rele- 
vance to digital humanities is called for implementing the described metrics in an 
accessible interface. For this, we are commencing usability studies. Also, the con- 
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siderations here of generalization operations are limited to, in our experience as 
cartographers, the main generalization operators in terms of their consequences 
and, therefore, significance for historical interpretative research. We are in- 
trigued by the limited consideration of generalization in historical research with 
old maps and strive to develop a more thorough analysis of old maps and the im- 
pacts of generalization operations. 

A central epistemological issue for interpretative research in digital humanities 
work with historical maps remains the relationship between cartographic features 
and historical objects and processes. Due to generalization, old maps will always 
have limited value in addressing these questions but can provide unparalleled in- 
formation when other resources are limited or their accuracy is in question. Thus, 
old maps can be invaluable even with the noted limitations, and considering gener- 
alization's impacts on the maps can provide additional relevance. 

The future research directions for considering scale in historical interpretative 
work involve pragmatic and theoretical issues. Zooming is (not just) scaling. Prag- 
matically, in the digital source critique, it is essential to clarify how present-day 
comparisons using current online maps and old maps can lead to numerous errors. 
The cartographic features of an OpenStreetMap (OSM) can greatly bias interpreta- 
tions of old maps without understanding current approaches to mapping produc- 
tion and the differences between zooming and scaling. While zooming challenges 
historical analysis to understand what has not changed in the graphics, allowing a 
sense of getting close to graphical elements in zooming in and an overview in 
zooming out, considering multiple historical maps at different scales benefits from 
a better understanding of generalization changes made to support visual communi- 
cation and the relating graphic changes, which can lead to very different graphic 
presentations in historical maps. The consequences for historical interpretative re- 
search can be considerable. 


List of abbreviations 


GIS geographic information system 

NAWA National Agency for Academic Exchange 
OSM OpenStreetMap 

UN United Nations 


USGS United States Geological Survey 
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@ Weather Map: A Diachronic Visual Model 
for Controversy Mapping 


Abstract: The Weather Map is a visual model to investigate public debates on 
media. Relying on the Media Cloud archives, the visual model transforms a simple 
query into a sophisticated visualization by employing the visual grammar of synop- 
tic weather charts. Peaks of pressure and clashes between airmasses are used to de- 
scribe the conflicts in media through the temporal dimension, diving into the 
human and non-human dynamics that make the controversy alive. The Weather 
Map was conceived as a digital tool to help students and scholars analyze public 
debates, according to the controversy mapping field founded by Bruno Latour. In 
particular, the visual model pushes the boundaries of network visualization, explor- 
ing advanced techniques of graphic design. The outcome is a web-based application 
developed in JavaScript and Python at the disposal of education and research. 


Keywords: controversy mapping, network visualization, visual literacy 


Representing time using a single static image is one of the most challenging exer- 
cises in visual arts An excellent example of this gesture comes from the photogra- 
pher Etienne Jules Marey whose “photographic gun” captured living beings’ 
movements in controlled environments. Such a photographic technic gave life to 
a series of shots that followed one another in a linear narrative, revealing the 
complexity and elegance of motion (see Figure 1). Similarly visual, the Futurist 
movement represented technical progress by depicting the mechanics and dy- 
namics of locomotives, airplanes and cars. One of its representatives, Umberto 
Boccioni, was ambitious enough to imagine the human body as a mechanical sys- 
tem in motion, molding one of the most famous sculptures of all time, titled 
Unique Forms of Continuity in Space (see Figure 2). Exhibited in New York by 
MoMA and the Met, Boccioni’s sculpture portrays all potential athlete’s move- 
ments over time through the static materiality of bronze. These two artists repre- 
sent time differently: while Marey breaks down the motion linearly into an 
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Figure 1: Etienne Jules Marey's photographic gun represents the linearity of time from right to left 
(Wikimedia Commons contributors 2021). 


Figure 2: Umberto Boccioni's Unique Forms of Continuity in Space represents motion in a nonlinear 
form (Wikimedia Commons contributors 2015). 
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interpolated sequence of images, Boccioni's artistic gesture molds time by com- 
posing a multifaceted form nonlinearly. This latter vision by Umberto Boccioni 
inspires this text in representing information through nonlinear forms (Flusser 
2014), with close attention to the temporal dimension. 

The representation of time is a challenge that attracted not only artists over 
time but also designers who deal with temporal information. In much the same 
way as the artworks by Marey and Boccioni, data visualizations can portray time 
through linear or nonlinear representations (Rosenberg and Grafton 2010). While 
the timeline (see Figure 3) is an established linear model investigated in different 
doctoral theses (Huron 2014; Kräutli 2016; Vane 2019), nonlinear representations 
are considerably less frequent. Among these, two examples (see Figures 4 and 5) 
provide a more accurate picture of the nonlinear interpretation of time. Inspired 
by tree-ring dating, Pedro Cruz's diagram of U.S. immigration uses radius for time 
and direction for provenance (Cruz et al. 2022), while Kirell Benzi's network 
shows the spread of CERN’s original tweet to illustrate the Higgs boson's discov- 
ery (Benzi 2017). The data visualizations by Cruz and Benzi are pertinent for two 
reasons. The first reason is formal: the visual compactness prevents any readabil- 
ity issues created by wide-image linearity; the second is practical: representing 
time in a nonlinear way is a less intuitive and more challenging intellectual exer- 
cise. These reasons make the premises present a new visual model. 


Figure 3: Steeve Chapple and Reebee Garofalo's timeline describes the history of music linearly. Each 
layer indicates a music genre by showing the start and end dates. The height of the visualization 
corresponds to the number of genres recorded in a given period (Chapple and Garofalo 1977). 


This text further investigates the nonlinear representation of time by using weather 
maps as metaphors to draw diachronic network visualizations. Also known in me- 
teorology as synoptic weather charts, these maps are visual instruments to predict 
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Figure 4: This diagram shows the provenance of immigrants in the U.S. over time. The flow of 
European immigration appears in green, while the more recent Asiatic flow is pink (Cruz et al. 2022). 


weather changes. Their accuracy is high in the hours following computation, but 
the precision decreases over prolonged periods. Although we all are accustomed to 
seeing them in newspapers and television, their interpretation is so complex that 
experts often need to illustrate the meaning. One reason for such complexity is the 
multiple information levels that compose weather maps, which will be more under- 
standable with a brief explanation of their visual grammar. 

This section is an introduction to the graphic design of synoptic weather charts. 
Many experts in the field published comprehensive books on this subject (Ahrens 
and Henson 2019; Pearce 2002), but the illustration by the cartoonist Randall Mun- 
roe (see Figure 6) is sufficient for our purposes (Munroe 2015). The most noticeable 
layer of information in weather maps probably corresponds to elevation lines, usu- 
ally employed in topographic maps to indicate altitude. Meteorologists use eleva- 
tion lines to represent air pressure, the primary measurement to forecast weather 
conditions. When the pressure is high, the sky is clear; when it is low, the weather 
is rainy. Like topographical maps, the number of lines indicates intensity. Unlike 
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Figure 5: The scientific community found evidence in 2012 of a particle hypothesized by Peter Higgs. 
This graph shows the retweets originating from CERN’s initial post on Twitter (Benzi 2017). 


topographical maps that use colors to distinguish between land and sea, weather 
maps’ elevation lines are uncolored but mark the peaks of high and low pressure 
with the letters H and L, which might be colored in red and blue to increase read- 
ability. Although these graphic elements already give a static image of weather 
conditions, a further layer of information enriches pressure with dynamism by 
marking air collisions with front lines. Front lines draw attention to the friction be- 
tween different pressures by tracing a thick curve. While elevation lines form a 
pattern all over the map, front lines indicate precise areas of interest by marking 
the movement with a thicker line. When the cold air gains ground, the front line 
features a series of triangles, substituted by semicircles when the warm air advan- 
ces; both are aligned along the front line to indicate the direction of movement. 

As already mentioned, the central idea is to design a diachronic visual 
method inspired by weather maps’ visual grammar, but diving deeper into the 
cultural context will help to understand the primary purpose. The visual method 
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Figure 6: The illustration shows the weather forecast's visual grammar (Munroe 2015). Among the 
various graphic elements that compose the visualization, elevation and front lines provide critical 
information to understand weather changes by indicating pressure and movement. 


finds its practical use in controversy mapping, a pedagogical method created by 
the philosopher Bruno Latour from the analysis of scientific controversies. 

The analysis of scientific controversies origins in Science and Technology 
Studies to investigate the necessary conditions for scientists and engineers to rec- 
ognize their work (Latour 2021). When teaching at École des mines, only then did 
Latour transform it into a pedagogical method to study public controversies. 
While the analysis of scientific controversies focused on science, the controversy 
mapping shifted the method’s attention to public debates external to the scientific 
world (Venturini and Munk 2022; Gray et al. 2022; Latour 2021). In Latour’s course, 
students were encouraged to investigate scientific debates starting from public 
broadcasting. For example, during the online course Scientific Humanities, given 
in the fall of 2014 through the platform France Université Numerique, students 
were asked to analyze a debate of their choice starting from a newspaper article. 
Following the Actor-Network Theory (Latour 2005), the initial analysis aimed to 
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identify in the article the human actors (i.e., politicians, journalists and scientists) 
as well as the non-humans (i.e., innovative technologies, information systems and 
political organizations). The identified actors were later divided into two factions 
according to their standpoint on public debate, trying to delve into the subject by 
mapping oppositions. This diagrammatic representation was the basis for the in- 
vestigation's backbone on which to add further information, such as hierarchical 
structures, verbatim quotations and relationships between actors in a sort of on- 
tological scheme. Although many of Latour's pupils are today concerned with cli- 
mate change (Baya-Laffite and Cointet 2014; Pryck 2022; Venturini et al. 2014), the 
controversy mapping covers a large variety of subjects (Seurat and Tari 2021). 
This great diversity reminds us how controversy mapping is not aimed at achiev- 
ing specific knowledge but rather at learning a method of analysis with a wide 
range of applications and use cases. 

A set of digital tools were at students' disposal to integrate their inquiry. 
When Sciences Po's curriculum included controversy mapping, the médialab sup- 
ported institutional pedagogy by working on open-access software to explore pub- 
lic debates. The list of digital tools included Gephi to analyze networks (Jacomy 
et al. 2014), Sigma.js to visualize networks (Jacomy [2012] 2022), Hyphe to trace 
Web hyperlinks (Jacomy et al. 2016) and Seealsology to reveal Wikipedia connec- 
tivity (Density Design [2014] 2022). It is interesting to notice that all these tools 
represented an effort not only to collect and analyze but also to visualize data. 
Indeed, the visual form of information enhances the process of interpretation 
and provides evidence to support arguments during presentations. When visual 
expertise was increasingly valuable for data analysis, the médialab was probably 
one of the first laboratories in the world to recognize the importance of visual 
tools for students and scholars. 

The theory and practice of médialab found common ground in networks, 
which acted as boundary objects between different application domains (Bowker 
and Star 1999): on the one hand, the Actor-Network Theory was interested in the 
complexity of social relations in technological discourses (Akrich, Callon and La- 
tour 2006); on the other hand, social network analysis was emerging with more 
intensity in the scientific environment (Lazer et al. 2009; Scott [1991] 2000). In ad- 
dition, networks' popularity increased through the advancements in data visuali- 
zation, as confirmed by books (Lima 2011), visual models (Rigal and Rodighiero 
2017; Windhager et al. 2020) and retrospective exhibitions (Barabási et al. 2020). 
The Weather Map described in these pages is precisely situated at the intersection 
of controversy mapping and network visualization, focusing on the lack of rela- 
tional time-based visual models. Although innovative algorithms advanced the 
computational load of networks, leading to the analysis of complex structures, vi- 
sual grammar barely evolved from the drawings made by Jacob Moreno almost 
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one century ago (Moreno 1934). The Weather Map, in this sense, represents an 
effort to fill this gap by developing a new aesthetic metaphor for network visuali- 
zation, in line to what Johanna Drucker teorized a few years ago (2010: 72-73). 

The back end of the Weather Map finds its foundation in the Media Cloud, an 
open-source platform to analyze media ecosystems through millions of newspa- 
per articles (Roberts et al. 2021). The platform allows users to make different re- 
quests, including queries on specific subjects treated in media. Even though 
mainly related to American culture, its collection was perfect for examining pub- 
lic debates from a temporal dimension — information rarely accessible online in 
newspaper archives. When the project started, the Media Cloud covered around 
ten years of online newspapers, a perfect time window to investigate the subject 
of biomass energy. The choice of mapping biomass energy for a case study had 
many reasons: among them, it was considered the involvement of private and 
public actors, the growing visibility in public broadcasting and the presence of 
technology as one of the non-human actors. 

The Weather Map visual model answers questions related to the public de- 
bate on biomass. Who are the individuals and the organizations involved in the 
discussion? What are the specific topics of discussion to which they are commit- 
ted? Who and what is emerging from this discussion? Who are the actors leaving 
the debate? Moreover, where are the frictions between opponents? Compared to 
other network visualizations, the insights that the Weather Map can provide are 
more multifaceted because of the different layers of information. However, as ex- 
plicitly described in this text, temporality covers a central role in the diachronic 
visual model. 

The creation process starts from Media Cloud with data extraction. After re- 
finement, the most appropriate query to cover the subject of biomass was "(bio- 
mass OR ‘bio-energy’ OR ‘bio energy’ OR ‘bio-economy’ OR ‘bio economy’) AND 
language:en” for English-speaking collections including United States, India, Rus- 
sian Federation, France, United Kingdom and Ghana. The request returned 20,000 
articles published between 2011 and 2020 as a list of JSON files downloaded via 
API (Rodighiero 2021). Each file contained metadata and the outcomes of text 
analysis to avoid copyright infringements for sharing full texts. Metadata such as 
title, source, date and URL came along with extracted entities generated from Nat- 
ural Language Processing techniques to identify people, organizations and places 
mentioned in the full-text body. In addition, Media Cloud’s topic modeling pro- 
vided a list of keywords from the New York Times’ classification, which is valu- 
able for pairing specific topics with individuals and organizations. For Weather 
Map, entity extraction is a key computational technique to identify and space out 
actors on the Cartesian plane, facilitating the distant reading of hundreds of thou- 
sands of documents. 
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Individuals, organizations and keywords were selected to draw the network 
visualization. Each entity was profiled by collecting name, type, frequency and 
co-occurrence by processing the downloaded JSON files. It is essential to keep in 
mind from the beginning that frequency is used to measure the trend of every 
entity, while the co-occurrence of entities is fundamental to arranging network 
visualization. The latter is processed first via UMAP dimensionality reduction that 
spaces out individuals, organizations and keywords (McInnes, Healy and Melville 
2018). Word embedding, usually employed in digital humanities to visualize cor- 
pora by word frequency (Berger, McDonough and Seversky 2017), organizes enti- 
ties in the bi-dimensional space: the more two entities are mentioned in the same 
article, the closer they appear on the map. Such a spatial organization of individ- 
uals, organizations and keywords makes possible the identification of thematic 
clusters, which are critical to divide the biomass debate into subcategories for 
better understanding. 

As with many algorithmic outputs, the Weather Map undergoes a process of 
parametrization. Entities, for example, are manually filtered to limit the number 
of elements and reduce visual complexity. For biomass energy, the entity's thresh- 
old of visibility was set to 50 mentions over ten years. In addition, UMAP hyper- 
parameters must be tested to find a proper arrangement for the network. Within 
data selection and treatment, parametrization covers a necessary procedure in 
the design process, affecting the final result through subjective configurations. 

Another critical element of Weather Maps concerns how actors' activity is mea- 
sured, which is correlated to the frequency of entities' appearance in newspapers. 
The meteorological pressure is used as a metaphor to show whether an entity is tak- 
ing up space in the public debate. Pressure measurement results from simple linear 
regression applied to each entity according to their yearly frequency of appearance 
in newspapers. When the line's inclination results are positive, mentions have grown 
over the years and the entity has become visible in public debates. Then, inclinations 
are normalized by subtracting the public debate's average trend. The resulting fre- 
quency and arrangement are visualized during the design process for intermediary 
inspections (see Figure 7). Finally, high- and low-pressure clusters of entities are rec- 
ognized using HDBSCAN (McInnes, Healy and Astels 2017), allowing the identification 
of collisions between warm and cold air when they overlap (see Figure 8). One of the 
most exciting outcomes of this process is the decomposition into subcategories, dem- 
onstrating that more specific controversies give form to the whole public debate. 

The computed data were then exported in CSV files, ready to be integrated into 
a web-based interface offering an advanced level of interactivity. The functions of 
zooming or selecting, for example, permit the distribution of information on differ- 
ent layers (Shneiderman 1996). Indeed, the Weather Map is a data visualization em- 
bedded into a web-based interactive interface composed of different layers and 
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Figure 7: Blue and red correspond to emerging and disappearing actors in the public debate, while 
gray indicates keywords from the New York Times' classification. The size of the circles is 
proportional to frequency. 


panels (see Figure 9). The entities' frequency gives form to elevation lines by mold- 
ing “peaks” over the network visualization, indicating the most active clusters of 
debate that would be harder to recognize otherwise (Rodighiero and Romele 2022). 
Its graphic rendering relies on d3-contour, made available by the well-known li- 
brary for data visualization d3.js (Data-Driven Documents [2017] 2023; Bostock, 
Ogievetsky and Heer 2011). This topographical background is then enriched by key- 
words, which orientate viewers by showing different topics treated in clusters. The 
size differs according to frequency, so the most used keywords are more visible 
than others. Entities at this level of zoom are represented by placeholders with the 
sign “+” to leave more visibility to keywords and contours at the forefront. Then, 
emerging and disappearing clusters are indicated by letters H and L, situated at the 
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- Figure 8: The algorithm for cluster 
identification is run twice, one for emerging 
and one for disappearing actors. The overlaps 
between blue and red clusters point out 
collisions like in a zero-sum game. 


center of the polygons introduced in Figure 8. The front lines represent the most 
valuable graphic element that enriches the distant view by marking the antago- 
nisms between opposing groups of actors. They are geometrically generated start- 
ing from the average point between two centers of opposed clusters, accentuating 
the curve's convexity toward the emerging actors. The result of such a mapping 
differs from Latourian controversy mapping, in which two factions usually repre- 
sent the entire public debate. Indeed, the Weather Map proves that public debates 
are not composed of two fronts but multiple. Figure 9, for example, shows how the 
general discussion on biomass is multifaceted, identifying three more prominent 
topics: climate change, energy production and national politics. 

When exploring the concept of Anti-Zoom, Bruno Latour correctly concluded 
that the optical zoom in digital maps is still a myth. Maps of one territory change 
according to scale and zooming is necessary to glue to connect them in a continuous 
visual effect (Latour 2014). In the Weather Map, elevation lines, keywords and front 
lines offer different affordances that invite readers to zoom into specific areas. Fig- 
ure 10, for example, shows the area associated with politics where some notable ac- 
tors can be identified. After zooming, the “political cluster" reveals in red the actors 
entering the public debate and in blue the ones leaving it. Among them are Barack 
Obama and Donald Trump, with the former leaving room for the second — it is es- 
sential to remember that this map is updated to December 2020. In addition, the left 
panel allows users to access more detailed information: clicking on Trump's icon 
activates a contextual panel containing the Wikipedia link, some statistics of fre- 
quency, the general trend and a random selection of ten hyperlinks to newspaper 
articles used as sources of information. The contextual panel complements the net- 
work visualization, offering additional insights to readers and developers. 
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Figure 9: This Weather Map represents the public debate on biomass energy. At first glance, the 
elevation lines indicate the most active areas, enriched by keywords to summarize the content. Letters H 
and L stand for high- and low-pressure peaks, whose collision is marked by curved front lines. The 
interface is publicly available for testing at https://rodighiero.github.io/weather-map/ (Rodighiero 2021). 
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Figure 9 (continued) 
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An example of navigation can start from this area (see Figure 10). Looking for 
companies involved in U.S. politics, a reader might click on Shell, a multinational 
active in oil and gas. From the contextual panel, the reader might successively 
open a newspaper article describing how Shell is interested in reusing coffee 
beans to create biofuel. This simple example also shows how instruments like the 
Weather Map are not intended to provide proof but rather exploratory systems to 
delve into complexity (Klein 2022). 


Conclusions 


Weather Map’s visual grammar transforms the heuristic scope of controversy map- 
ping by modifying the conditions of the representation. At Sciences Po and Ecole des 
Mines, the field of controversy mapping uses static networks of actors or event-based 
chronologies. According to this methodology, the relations that articulate networks re- 
main invariably inert. The lines connecting the different points harden, reduce and 
immobilize the social game of friction (Tsing 2005), enrolment tactics (Callon 1986) 
and alliance strategies (Callon and Latour 2017). A static representation prevents view- 
ers from seeing lines' whole life, including the passage into new arenas of debate 
(Dodier 2003). The same is true for the chronological frieze, which reframes the past 
into a dot alignment whose relational logic is summarized in a game of successions 
and accumulations: this event + this event + this event = the controversy. Like ento- 
mologists killed and pinned butterflies on blank sheets of paper to observe them, con- 
troversy mapping is currently equipped with tools that kill the controversy to study it. 

Faced with these challenges, the Weather Map offers an alternative. First, it 
uses thermal semiotics polarized between warm and cold to approach the contro- 
versy in the making by mapping the formation of fronts where the debate crystal- 
lizes. This visual method opens the way to studying multi-positional movements 
through the plurality of actors that shape controversial events. Instead of re- 
examining once it occurred, the controversial event is captured in statu nascendi 
before becoming an event. In addition, the whole logic of visualization is turned 
upside down. To put it with an image, it is no longer a question of waiting for the 
rain to fall but of considering the clouds announcing its coming. Technically, the 
Weather Map promises a shift in observing controversies' emergence and growth 
by placing the inquiry's standpoint one step ahead of their institutionalization 
and public recognition. This shift brings the controversy studies closer to social 
life in motion. It makes it possible to return to the initial program expressed by 
Bruno Latour, which consisted of looking at science and society in the process of 
being made rather than already made (Latour 1999). 
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CU 154, 158, 159, 161, 163, 170-72, 175, 177, 179, 
182-84 See close-up 

cultural analytics 163, 425 

cultural technique 7, 244 

Cytoscape 13, 367, 370, 375-78, 385-89, 394, 
402, 405 


Dancing Baby 11, 121-24, 149 

data sonification 216 

data visualization 19, 30, 32, 42, 108, 337, 367, 
369, 370, 373, 374, 403, 404, 414, 451, 455, 
457, 458, 465, 470 

datafication 163 

datascapes 23, 410, 414, 421, 425 

- classical 423 

- exploration tools as 409 

- next generation of 414 

DECIMA 10, 88, 91, 92, 95 See Digitally Encoded 
Florentine Census 

degree 

- of a node 274, 277 

- of change in detail 318 


- of detail 8, 195 

- of entropy 229 

- offractality 289 

- of generality/specificity 297 

- of internal similarity 227 

- of irregularity 303 

- of realism 205 

- of self-similarity 302 

- of spatial clumping or aggregation 311 

- of variation over time 221 

degrees of generality and specificity 290, 300, 
302, 308, 314 

demonstrosity 256 

desemantification 244 

detail shot 154 

DHA 56-58, 63, 68, 70, 74 See discourse-historical 
approach 

diction 246, 250, 254, 259 

Die Rüuber 271-74, 276, 277, 281, 282 

digital argument 

- effect of 344 

- experiments in 357 

- long-form 336, 338 

- nature of 335, 339 

- publication of 339 

- structured form of 344 

digital audio 214, 215 

digital cartography 439 

digital environment 11, 15, 189, 190, 196-99, 201, 

202, 209, 285, 331 

igital lens 56, 58, 64 

igital pedagogy 105 

igital preservation 192, 426 

igital reconstruction 209 

igital space 12, 190, 196, 200, 202, 237, 240, 

241, 243, 247, 248, 255, 257, 258 

igital storage 215 

igital textuality 238 

igital writing 12, 237-45, 247-49, 253-57 

iconic 246 

igitally Encoded Florentine Census 88 

igitization 54, 62, 66, 67, 71, 73, 81, 82, 94, 104, 

189, 192, 193, 195, 196, 198, 209, 215, 262, 
263, 340, 369, 412, 427, 468 

Dirty Dancing 123 
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distant framing 159 

distant reading 7, 14, 47, 50, 70, 122, 126, 142, 
151, 164, 175, 261, 264-66, 269, 270, 275, 
276, 284-87, 331, 456, 469 

document entropy 294, 295, 298, 319 

Drama Corpora Project 272, 273 

drama history 279 

Duckroll 122 

dust(s) 307, 308, 318 

- Cantor 307, 319 

- fractal 315 

- one-dimensional 318 

Dutch East India Company 291 


edge weight 372, 379, 380, 382, 390, 394, 398 

electronic literature 240, 244, 258 

- 3D immersive virtual 244, 258 

electronic noise 225 

emotion display 151, 153, 155-57, 159-61, 
163-65, 168, 170, 173-75, 179, 180 

English Literary History 240, 258, 259 

enslaved agency 99, 101, 110 

enslaver oppression 101 

environmental noises 226 

Europresse 133, 134, 136, 139 

Excel 293-95, 297, 299, 303, 305, 307, 
309, 332 

exceptional normal 77, 84, 85, 94 

explainability 

- technical and visual 409 

exploratory data analysis 376 


Facebook 123, 129, 132, 141, 142, 368 

Factiva 133 

Felix the Cat 126 

female characters 151, 152, 160, 161, 165, 168, 
170, 172, 173, 175, 179, 183, 184 

feminist reading 165, 184 

field clustering 418 

figures of speech 246 

film 

- language 153, 154, 184 

- techniques 152, 153, 158, 162, 172, 177 

film theory 

- cognitive 161, 162, 177 

- feminist 161 

filmic grammar 36 
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First World War 59, 60 

focal length 6, 27 

focalization 20, 27, 34, 35, 39 

fougéadoire 6 

Founding Generation 97, 98, 112 

fractal 

- dimension 289, 303-7, 311, 
318, 325, 330-32 

- geometry 12, 287, 308, 311, 317, 323, 325, 
330, 331 

framing theory 152, 153 

France Université Numérique 454 

Freedom on the move 103-5, 114 

frequency scale levels 217 

frequency spectrum 220, 221, 226, 227, 229, 231 

front lines 453, 454, 459, 460 

fugitive slave advertisements 102, 110 

futurist movement 449 


GarganText 125, 126, 407, 408, 411, 412, 421, 
423, 424, 469 

gender representation 151, 153, 159, 160, 164 

generalization 

- cartographic 427, 429, 432, 435, 441 

- concept of 428 

- content 432 

- in historical maps 428 

- of old maps 431, 438 

- operation(s) 429, 430, 432, 439, 444 

- operators 432 

- quality 432 

generalized view 200 

generative pre-trained transformers 25 

generator 288, 307 

Geocities 121, 122, 149 

geographic information systems 7, 87, 347, 470 

geographic knowledge 106, 427 

Geography of Slavery in Virginia 107, 108, 114 

geoinformation technologies 199 

Gephi 13, 367, 369, 370, 371, 375-78, 390-404, 
410, 412, 424, 455, 465 

German Drama Corpus 272, 276, 278 

GIS 7, 77-79, 87, 88, 90-93, 95, 96, 106, 115, 347, 
360, 432, 435, 436, 444-46 See geographic 
information systems 

- historical 10, 87, 88, 90, 91 


GitMA 56 

gliding box 311, 312 

- algorithm 311 

- side of the 311, 312, 328 

global cooling 291, 293 

global history 15, 83, 85, 86, 91, 114, 333 

global trade 295 

Godwin's Law 11, 121 

Google Books 80, 81, 369 

Google Earth 266, 267 

GoogleMaps 429 

grammatical patterns 49 

Grand Theft Auto IV 126 

granularity 

- informational 12, 302, 323, 324 

- large-scale 184 

- ofthe data 221 

- temporal 27 

Graph Poem 12, 237, 239, 240, 242, 245-47, 249, 
257, 259, 469 

graph theory 237, 239, 240, 246, 271, 403, 405 

graphical user interface 247, 375 

Grumpy Cat 126 

GUI 375 See graphical user interface 

Gulliver's Travels 290, 314, 318, 321, 332 


Hamlet 275 

Hampster Dance 121, 122 

Harlem in Disorder 12, 335, 338, 339, 345, 346, 
350, 351, 357 

Harlem Shake 11, 119, 120, 130-37, 139, 141-46, 
148-50 

Hathi Trust Research Library 369 

heritage 

- cultural 11, 189, 190, 198, 199, 209-11, 215, 412, 
426, 466 

- digital 189, 199, 210, 211, 468 

hermeneutic circle 52 

hermeneutics 50, 71, 73, 287, 325, 361, 467 

- critical 367, 370 

- crucial act of 370 

- digital 19, 214, 217, 287, 330 

- human 215 

- traditional 90 

heuristics 43-46, 50-52, 66, 69, 438, 443, 446 

higher mathematics 244 


historicization of virality 119, 129 

historiography of slavery 97, 102 

history 

- of a family's own connections 290 

- of American slavery 101 

- of concepts 44 

- of digital cultures 119 

- of enslaved Africans 290 

- of microhistory 84 

- of newspaper discourses of the Weimar 
Republic 62 

- of online virality 11, 120 

- of representations 48 

- of science and technology 467 

- of slavery 97 

- of sound 215 

- of the Bible in the United States 349 

- of the Internet and the Web 119, 468 

- of the Weimar Republic 59 

- of visualization tools 374 

Hivi project 131, 133, 468 

human listening 216 

human sounds 225 

humanistic inquiry 369 

hyperlinks 241, 344, 348, 357, 455, 459 


iconic writing 255 

iconicity 12, 237-40, 245, 247, 248, 255-59 
iconostasis 

- 3D model of the 193, 194, 197, 200, 210 
- carved 189 

- carved wooden 191 

- digitization of the 192, 193, 195, 468 

- laser scanning of the 195, 204 

- of the Transfiguration Cathedral 190, 200 
- reconstruction of the 209 

If It Doesn't Spread, It's Dead 129, 149 
image ecology 241 

imagined distance 166 

Immodest Acts 85 

InaSpeechSegmenter 222 

inclusion quotient 164 

individual experience 98 

information flaneur 31, 35, 42 
information retrieval 51 

initiator(s) 288, 307, 319, 321, 322, 324 
intercultural communication 163, 468 
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intercultural framing 156 

intercultural reading 165, 183 

intermedia remix 257 

intermediality 240, 242, 245, 248, 254, 260 
Internet Archive 369 

Internet phenomena 119-22, 126, 131, 146 
interrelational conglomerates 242 
interrelational reticular architecture 243 
inter-temporal matching 418 

Iramuteq 147 


Java 293 

JavaScript 449 

Jim Button and Luke the Engine Driver 3, 4, 14 

Journal of Digital History 73, 233, 336, 348, 349, 
361, 467, 468, 470 

Journal of Multi-Media History 336 

JSON 456, 457 

JSTOR 81 

Jupyter notebooks 348 


Kepler.gl 108 

key words in context 137 

knee shot 154 

Know Your Meme 121, 125, 126, 149 

knowledge extraction 41 

knowledge graph 10, 19, 20, 23-27, 31, 34-36, 
39-42, 423, 467 

Koch curve 288, 299, 307, 313, 319 

KWIC 137 See key words in context 

KYM 121, 122, 125 See Know Your Meme 


lacunarity 311-13, 318, 324, 325, 328, 330, 332, 333 

lanternists 6 

latent Dirichlet allocation 293 

Law & Order Criminal Intent 11, 151, 152, 155, 160, 
164, 168, 170, 172, 183 

layout(s) 

- circular 378, 380 

- circular network graph 380 

- compound spring-embedder 378, 385, 
388, 389 

- edge-weighted force-directed 378, 385 

- edge-weighted spring-embedded 378, 385, 388 

- elliptical 380 

- elliptical (circular) 382 

- elliptical (circular) graph 383 


476 — Index 


- Energy 382 

- ForceAtlas2 394, 398, 399 

- force-directed 26, 372, 377, 378, 383, 385, 390, 
392, 405 

- Fruchterman-Reingold 385, 390, 393, 394 

- grid 378, 385 

- Kamada-Kawai 383, 384 

- map 431 

- OpenOrd 396, 397 

- prefuse 386, 387 

- prefuse force-directed 378, 385 

- Yifan Hu 391, 395 

- Yifan Hu proportional 391 

- z-text 302 

LDA 293 

lenses 

- cultural 240 

- focal 6 

- lenticular 28 

Les Misérables 13, 367, 370, 376-80, 384, 387, 
388, 390-92, 394, 396, 397, 399, 401, 404 

level 

- concept of 407, 411 

level(s) 

- of abstract conceptualization 32 

- of abstraction 26, 30, 31, 266 

- of acoustic object recognition 221 

- of complexity 423 

- of detail 30, 31, 34, 35 

- of evidential certainty 36 

- of frequency-based analysis 217 

- of generality 12, 20, 287, 294, 295, 307, 
324, 325 

- of generality and specificity 12, 287, 294, 295, 
307, 317, 325 

- of granularity 221, 414 

- of historical periodization 33 

- of observation 13, 415, 416, 418 

- of organisation 8 

- of reading 120, 151, 164, 172, 177 

- of scale, resolution and abstraction 19 

- of segmentation 272 

- of similarity 229 

- of sonic diversity 224 

- of sonic vibration 216 

- of spatial detail 199 

- of three-dimensional photorealism 41 


- of zoom 30, 33, 458, 461, 462 

- of zoomed visualization 28 

lingual inscription 242 

lipdubs 132 

literary history 149, 240, 256, 262-64, 277, 278, 
283-85 

literary studies 49, 239, 240, 261-63, 265, 267, 
271, 276, 281-83, 285, 286, 340, 405 

Little Ice Age 291 

long shot 21, 154, 158, 159, 161 

long-range correlation 308, 309, 325, 326, 332 

longue durées 33 

LS 154, 159, 170, 171, 175, 177, 179, 182, 183 
See long shot 


machine learning 11, 23, 26, 41, 151, 164, 
241, 469 

macroanalysis 263, 269, 270 

macrohistory 91 

macroscale 92 

macroscope 7, 23, 33, 35, 42, 287, 407, 411, 424, 
425, 432 

magic lantern 6, 14 

magnifying glass 8, 89, 287 

Making Deep Maps 347, 360, 363 

male characters 160, 161, 172, 173 

male gaze 161, 175, 184, 186 

MALLET 293-95, 297, 314, 330, 331 

map(s) 

- cognitive 323, 324 

-deep 14, 347, 350, 360 

- digital 30, 266, 267, 346, 427, 429, 459 

- digitized old 427 

- generalized 429, 430, 438 

- historical 13, 35, 88, 427, 428, 431, 434, 
438, 444 

- old 427-31, 434-36, 438, 440, 442-44 

- weather 451-53 

- zoomable digital 30 

marginalized communities 98 

maritime trade 291 

MASH 20 

mathematical model 239 

mathematical modelling 246 

mathematically chaotic effects 253 

MCU 154, 170-72, 175, 177, 179, 182, 183 See 
medium close-up 


mean reversion 309 

Media Cloud 14, 449, 456, 466 

media reading 165, 170, 183 

mediating transcendence 247 

mediation 8, 11, 14, 189, 215, 238, 239 

medium close-up 154, 159 

medium long shot 154, 159 

medium reading 119, 120, 130, 131, 146, 148 

memetics 119, 123 

Menocchio 10, 77, 82, 84, 93 

mesh 197, 205-7 

meso-scale 269, 270 

- of literary studies 269, 275, 276, 279, 282 

meter 191, 193, 246 

method of the least squares 305, 308 

microhistory 9, 15, 77-79, 82-91, 93-96, 110, 
290, 302, 319, 321, 333, 335, 339, 351 

- 2.0 91, 96 

- global 8, 79, 85, 94, 289 

micro-level substrate 238 

microscale 92 

microscope 5, 8, 57 

MLS 154, 170, 171, 175, 177, 179, 182, 183 See 
medium long shot 

monster 

- literary 255 

- ontologically intermedial 255 

Monster Theory 2.0 254, 258 

monsters 

- binary-straddling three-headed 255 


monstrosity 12, 237, 239, 240, 248, 249, 254-58 


monstrous hybridisation 249 

Mother of God 193, 200 

Multimodal Analysis Software 164 
multimodality 163 

multi-scalar reading 120 

multi-scale architecture 239, 240, 253, 257 
music theory 232 


named entity recognition 25 

narrative history 10, 21, 23, 33, 41 

narrative modelling 41 

narrative(s) 

- braided 84, 96 

- chronological 13, 335, 338, 352, 356, 
357, 359 

- collective 101 
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- colonialist 107 

- data-driven 9 

- digital 350 

- filmic 6 

- historical 10, 19, 22, 98, 112 

- hyperlinked 335, 338 

- individual 108, 109 

- interactive 91 

- layered 109 

- linear 339, 347, 351, 449 

- national and personal 97 

- print 342, 343 

- slave 105 

- spatial 7, 14, 108, 347, 348, 350, 360 

natural language processing 25, 90, 237, 239, 
241, 257, 280, 285, 456, 468, 469 

NCP See network control protocol 

Neatline 347, 348, 362 


network analysis 83, 239, 246, 247, 261, 270, 271, 


274-76, 279, 280, 282, 285, 340, 367, 371, 
375, 403, 405, 421, 425, 469 

- libraries 253 

- literary 261, 262, 269, 271, 275 

- of German plays 261, 262 

- of Twitter 404 

- scholarship 377 

- social 286, 403, 404, 455, 466 

- software 367, 370 

- tools 369, 375, 403 

network control protocol 249 

network graph(s) 13, 242, 345, 367, 368, 372, 
377, 395, 397, 399, 400, 403 

- aesthetic dimension of 396 

- aesthetics of 370, 373 

- force-directed 372 

- randomly generated 373 

network imagery 368 

network theory 273, 275, 284, 368, 404, 454 


network visualization 275, 276, 367-71, 376, 380, 


393, 396-98, 401, 403, 449, 455-59, 465 
- diachronic 451 
- of large-scale data 371 
- practices 397 
- software 370 
- tools 369, 374, 377, 378, 402, 403 
network(s) 
- character 271, 280-82, 384 
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- co-presence 271-73, 275-77, 279-81 

- coreference 271, 281, 282 

- unimodal 371, 379 

networked textual performativity 254 

Never Gonna Give You Up 122 

New York Times 143, 283, 285, 456, 458 

NLP 66, 90, 239, 240, 241, 246, 253, 257 See 
natural language processing 


node degree 372, 379, 380, 382, 384, 390, 394, 


398, 400, 401 
NodeXL 369 
noise category 226 
non-contiguous reading 267 
normative subject 85 
Northern Carolina runaway slave notices 103-5 
Numa Guy 129 
Nyan Cat 126 


objective framing 153 

OCR 55, 66, 80 

Old Bailey 23, 33, 37, 38 

Old Bailey Macroscope 23, 33, 35, 36 
Old Bailey Voices 23, 35 

Omeka 80, 336, 347, 350 

online virality 9, 11, 120, 468 
OntoRefine 25 

OpenRefine 25 

OpenStreetMap 444 

operative writing 244, 245 

optics 8, 189, 190, 196, 197, 199, 200, 202, 209 
- adjustment of 189, 209 

Orthodox Church 

- Eastern 211 

- Russian 191, 192 


page poems 241 
painted icons 191 


Pajek 13, 367, 370, 375, 378, 380-85, 388, 394, 


402, 405 
Paper Trails 345, 346, 360 


Paris Enquétes Criminelles 11, 151, 152, 155, 159, 


160, 164, 168-72, 174-78, 180, 182, 183 
particularity 238 
perceptual spectrum 216 
performance theory 238 
performative relays 238, 240, 242, 257 


performativity 237, 242, 248, 255, 256, 259, 404 


Periscope 129 

Perlego 81 

Perm Art Gallery 11, 189-92, 195, 210, 211 
Phantasmagoria 6, 14 

Photogrammar 345, 430 
photogrammetry 29, 193, 211 
photographic gun 449, 450 

Photoshop 376 

phylomemy 

- global shape of the 419 

- reconstruction process 407, 417, 418, 469 
Pieter Bruegel the Elder 291 

platform programming 238 

Plumes 289, 314, 318, 321, 332 
poem-nodes 237, 253 

poetry diction 12, 237 

point cloud(s) 29, 195, 197, 200, 204 
PointBox 197, 210 

post-photography 241 

presentification 238, 242 

principle of least effort 313, 333 

print monograph 335, 339 

probability 

- conditional 411, 423 

processual performative anthology 253 
Project Gutenberg 290, 331, 332 
proximity 92, 348, 373 

- cultural 151, 153, 183, 185 

- polygonised 34 

- spatial 345 

psychological reading 165 

Python 56, 166, 234, 247, 250, 253, 375, 449 


quantitative history 91, 265, 339, 340 


racial disorder 337, 338, 352 
racial history 102 

racial violence 13, 338, 359 
radio history 213 

radio monopoly 218, 223 
Rainette 147 

random walk 308, 310, 316 
Reddit 125, 126 
reductionism 238 

Reinert method 147 
ResearchGate 81 
resolution 


- temporal 418 

rhythmic analysis 227 

rhythmic pattern recognition 217 

rhythmic variations 227 

Rickroll 122, 126-28, 131 

root mean square fluctuation 308 

runaway slave advertisements 10, 97, 98, 101-4, 
106, 107, 110, 112, 113, 114 


scalability 97, 110, 111, 120, 146, 209, 217, 240, 
280, 281, 289, 413 

scalable reading 7, 9, 10-12, 43-46, 49, 53, 56, 
62, 66, 70, 73, 119-21, 129, 130, 233, 261, 
262, 266-68, 270, 275-77, 279, 280, 285-87, 
361, 411 

Scalar 339, 350, 351, 355, 357-59 

scale 

- cartographic 13, 427, 428 

- metaphor of 407-9 

- notion of 1, 8, 9, 407, 409, 410, 419 

- of analysis 85, 93, 319, 410 

- of description 13, 415, 416, 419 

- of historical research 79 

- of microhistory 79 

- of observation 85, 302 

- of slavery 101 

- ofthe data 37 

- of the European continent 90 

- of the individual body 90 

- ofthe print forms 357 

scaling 

- cartographic 9 

Schrodinger's Cat 126 

Scikit-maad 219, 234 

Scopus 13, 411, 412 

SCoT 48 

sculpture(s) 

- Orthodox 191 

- religious 189, 190, 192, 193, 199-201 

- wooden 189, 190, 192-94, 197, 201, 202, 205, 
207, 210, 211, 468 

Seated Savior 205-7, 211 

Second World War 37, 146 

self-emancipation 101, 102, 104, 106-9, 113, 114 

semantic landscapes 418 

semantic map 26, 407, 412, 423 
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semiologies of map symbols 431 

shot scale 11, 151-54, 157, 158, 164, 179 

- audiences' emotional involvement through 162 

- conventions 153, 166, 170 

- cultural 163 

- distribution 151, 161, 162, 164, 170, 171, 175-78, 
180, 182-84 

- distribution patterns 162 

- framework 154, 166, 167 

- terms and conventions 166 

- types 161 

- variable 165 

signal processing 213-15, 217 

- digital 213 

similarity 

- apparent 170, 175 

- Gestalt principles of 390 

- self- 288 

- semantic 302, 332, 418, 421 

- temporal 423 

-trend 177 

- visible 200 

Simpson's reciprocal index 219 

SketchFab 196, 197, 202, 210, 211 

small data 7, 14, 21, 33, 42, 262, 266, 331 

social imprints 146 

social negotiations 242 

social platforms 133, 141, 143, 146 

social scientific history 89 

socio-mathematical algebra 424 

sonic devices 246 

sonic diversity 11, 214, 219, 221, 224, 231 

source criticism 45, 51, 52, 58, 69, 72, 130, 434 

- digital 52 

spatial history 77-79, 84, 87, 90, 92-95, 346, 467 

- digital 93, 94 

spatial humanities 8, 84, 88, 360, 429, 432, 
445, 470 

spatialization 372 

speech act(s) 279, 281 

spreadability 121, 141 

spreadable medias 129 

StoryMap 347 

stratified representation of meaning 291 

stylistic features 246 

subject-in-transit 34, 35, 37, 40 
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surrogate(s) 267-69, 275, 280 
synoptic weather charts 13, 451, 452 
- visual grammar of 449 

systemic monsters 239 


TCP See transmission control protocol 

TEI 272, 273, 290 See Text Encoding Initiative 

telescope 5, 8, 57 

terms-to-terms relationships 418 

text as a scalable structure 297 

Text Encoding Initiative 272 

text mining 13, 45, 51, 137, 269, 340, 407, 408, 
411, 417, 418, 421 

textual analysis 37, 108, 217 

The Cheese and the Worms 77, 83, 85, 95 

The Fractal Geometry of Nature 288, 331 

The Great Cat Massacre 85 

The Inner Life of Empires 289, 314, 318, 321, 332 

The Lysander Flights 10, 23 

The Möbius Trip 151, 164, 468 

The Question Concerning Technology 249, 258 

The Return of Martin Guerre 85 

The Stanford GraphBase 376, 377, 404, 405 

The Two Princes of Calabar 289, 314, 318, 321, 332 

theory of fractals 288 

theory of mind 161-63, 187 

thick mapping 97, 109, 115 

thinkering 1, 54, 72, 287 

TikTok 129 

tomography 207 

Tools of Knowledge 10, 23, 24, 467 

topic modeling 12, 47, 48, 53, 67, 90, 94, 147, 
247, 280, 282, 287-89, 293, 313, 314, 
323-25, 330, 331, 456 

Transfiguration Cathedral 190-92, 200, 209, 
210, 468 

Transkribus 66, 80 

transmedia storytelling 348, 349 

transmediality 123 

transmission control protocol 249 

T-SNE 41 

turn 

- digital 21 

- network 368, 403 

- spatial 87, 260 

TV crime shows 158 

TV series adaptations 151, 153, 185, 468 


Twitter 99, 109, 116, 119, 122, 123, 126, 129, 132, 
141-44, 146, 148-50, 404, 453 
TXM 48 


ubiquitous control 239 

ubiquitous systemic control 240 

ubiquity 78, 239, 248 

UMAP 41, 457, 465 See uniform manifold 
approximation and projection for 
dimension reduction 

uniform manifold approximation and projection 
for dimension reduction 465 

Unique Forms of Continuity in Space 449, 450, 466 

universality 156, 238 


VBA See Visual Basic for Applications 
Vermeer's Hat 289, 292, 314, 318, 321, 330 
Veronese wills 80, 81 

Vertigo effect 202 

very long shot 154 

VIAF 25 

Videana 164 

video-capturing 202 

View from Delft 291 

Vine 126 

Virginia Gazette 102, 103, 111 

virtual gallery 192 

virtual reality 37, 197, 202, 413 
virtual space 33, 193, 200 

Visual Basic for Applications 293 
visual framing 152 

VLS 154 See very long shot 

VOC 291 See Dutch East India Company 
volatility 240 

Voyant 62, 80, 96 

VR technologies 202 


weak surrogate 198 

weather conditions 109, 452, 453 

Weather Map 13, 449, 455-57, 459, 460, 462, 
463, 465, 466 

Web of Science 13, 411, 412 

Weimar Republic 10, 43, 45, 46, 49, 53, 54, 
57-62, 64, 67, 70, 72, 73 

Wikipedia 121-24, 150, 455, 459 

Wikishark 123, 150 

word embeddings 48, 464 


WordPress 350 
Words and Deeds in Renaissance Rome 85 
world history 85, 94, 290, 291 


XML 283, 290, 294 
X-ray(s) 38, 207, 208, 211 


YouTube 120, 130-33, 139, 141, 143, 144, 150 


z-editor 290 

Zipf's law 288 

z-lexias 290 

zoom 

- cognitive 31, 37 

- cosmic 8, 14 

- dolly 202 

- filmic 20 
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- Hitchcock 202 

- long 287, 331 

zoom lens 6,7 

zoom shot 7, 21 

zoomable text 12, 287, 290, 313, 325 

zoom-in(s) 11, 30, 40, 189, 190, 192, 193, 196, 197, 
199, 200, 203, 205, 207, 209, 257, 291, 305 

zooming 

- cinematic 9 

- illusion of 427 

- metaphor of 7, 12, 44, 262, 270 

zoom-out 11, 189, 190, 196, 199, 200, 201, 203, 
209, 257 

zoom-zero 11, 189, 190, 200-2, 205, 209 

z-reading 290 

z-text 287, 290-93, 302, 313, 325 See zoomable 
text 
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